Towards User Profiling for Web Recommendation

合集下载

人工智能推荐系统个性化推荐的关键技术

人工智能推荐系统个性化推荐的关键技术人工智能（Artificial Intelligence，简称AI）已经在当今社会发挥着重要的作用，尤其是在推荐系统领域。

随着互联网的迅猛发展和信息爆炸式增长，人们越来越需要个性化的推荐服务来帮助他们过滤和选择信息。

个性化推荐系统的关键技术正是人工智能的重要应用之一。

在本文中，我们将探讨个性化推荐系统所涉及的关键技术，并介绍它们在现实生活中的应用。

一、数据收集和预处理技术个性化推荐系统的核心在于通过分析用户的历史行为和兴趣来预测用户的兴趣和需求。

因此，数据收集和预处理技术是个性化推荐系统的关键。

常见的数据收集方式包括用户行为记录、用户个人信息和社交网络等。

预处理技术主要包括数据清洗、数据集成和特征提取等。

通过数据收集和预处理，个性化推荐系统可以建立起用户画像，以准确理解用户的兴趣和需求。

二、协同过滤技术协同过滤是个性化推荐系统中常用的一种方法。

它基于用户与项目之间的相似性来推荐用户可能感兴趣的项目。

协同过滤技术可以分为基于用户的协同过滤和基于项目的协同过滤两种。

基于用户的协同过滤通过比较用户之间的兴趣相似度来进行推荐，而基于项目的协同过滤则是通过比较项目之间的相似度来进行推荐。

协同过滤技术的关键在于相似度的计算方法和推荐结果的评估方法。

三、深度学习技术深度学习技术是人工智能领域的热门技术之一，也在个性化推荐系统中得到了广泛应用。

深度学习技术通过构建深层神经网络模型，可以自动学习用户和项目之间的复杂关系。

通过深度学习技术，个性化推荐系统可以更好地理解用户的兴趣和需求，并为用户提供更加准确和个性化的推荐结果。

然而，深度学习技术在计算资源和数据规模方面的要求较高，需要充分考虑系统的可扩展性和稳定性。

四、推荐算法的融合与优化个性化推荐系统不同的算法有不同的优势和适用场景。

因此，推荐算法的融合与优化也是个性化推荐系统的关键技术之一。

通过将多个推荐算法融合在一起，可以充分利用它们的优势，提高推荐结果的准确性和个性化程度。

基于用户兴趣的个性化推荐算法研究

基于用户兴趣的个性化推荐算法研究孙克雷, 陈安东（安徽理工大学计算机科学与工程学院, 安徽淮南 232001）摘要：针对协同过滤算法存在用户兴趣不易捕捉的问题，提出了一种基于用户兴趣偏移和项目自身属性特征的个性化推荐算法。

利用滑动时间窗内项目属性和用户评分建立出用户兴趣偏爱因子，通过推荐项目自身属性特征给出用户对项目的偏爱度；最后结合项目偏爱度和协同过滤算法中预测评分产生推荐。

实验结果表明，该算法准确反映出用户兴趣的偏移和项目自身属性特征，在推荐质量上也得到提高。

关键词：用户兴趣；协同过滤；时间窗；个性化推荐中图分类号：TP391 文献标识码：A 文章编号：2095-8382(2017)01-065-05Research on Personalized Recommendation Algorithm Based on User InterestSUN Kelei, CHEN Andong(School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China)Abstract ：Aiming at the problem that user's interest is not easy to capture by the collaborative filtering algorithm, a personalized recommendation algorithm based on the changes of users’ interest and the self-characteristic of items is proposed. The interest preference factors of users are established by using items attributes and user rating within the sliding time windows. Then the items preference degrees of user are given by the characteristic of recommended items themselves. Finally, the recommendation is produced according to the preference degree of items and the predictive score in the collaborative filtering algorithm. Experimental results show that the proposed algorithm can accurately reflect the changes of users’ interest and the attribute of items themselves. In the meanwhile, the quality of recommendation is improved compared to the classical UserCF method.Keywords ：User interest; collaborative filtering; time window; personalized recommendation随着Web2.0时代的来临，数据信息量呈爆炸式增长，难以从海量信息中找到感兴趣数据。

索尼小型全帧镜头镜头说明书

Key FeaturesA new frame of mind.No other full frame, interchangeable-lens camera is this light or this portable. 24.3 MP of rich detail. A true-to-life 2.4 million dot OLED viewfinder. Wi-Fi sharing and an expandable shoe system. It’s all the full-frame performance you ever wanted in a compact size that will change your perspective entirely.World’s smallest lightest interchangeable lens full-frame cameraSony’s Exmor image sensor takes full advantage of the Full-frame format, but in a camera body less than half the size and weight of a full-frame DSLR.Full Frame 24.3 MP resolution with 14-bit RAW outputA whole new world of high-quality images are realized through the 24.3 MP effective 35 mm full-frame sensor, a normal sensor range of ISO 100 – 25600, and a sophisticated balance of high resolving power, gradation and low noise. The BIONZ® X image processor enables up to 5 fps high-speed continuous shooting and 14-bit RAW image data recording.Fast Hybrid AF w/ phase-detection for DSLR-like focusing speedEnhanced Fast Hybrid auto focus combines speedy phase-detection AF with highly accurate contrast-detection AF , which has been accelerated through a new Spatial Object Detection algorithm, to achieve among the fastest autofocusing performance of any full-frame camera. First, phase-detection AF with 117 densely placed phase-detection AF points swiftly and efficiently moves the lens to bring the subject nearly into focus. Then contrast-detection AF with wide AF coverage fine-tunes the focusing in the blink of an eye.Fast Intelligent AF for responsive, accurate, and greater operability with full frame sensorThe high-speed image processing engine and improved algorithms combine with optimized image sensor read-out speed to achieve ultra high-speed AF despite the use of a full-frame sensor.New Eye AF controlEven when capturing a subject partially turned away from the camera with a shallow depth of field, the face will be sharply focused thanks to extremely accurate eye detection that can prioritize a single pupil. A green frame appears over the prioritized eye when focus has been achieved for easy confirmation. Eye AF can be used when the function is assigned to a customizable button, allowing users to instantly activate it depending on the scene.Fully compatible with Sony’s E-mount lens system and new full-frame lensesTo take advantage of the lightweight on-the-go body, the α7 is fully compatible with Sony’s E-mount lens system and expanded line of E-mount compact and lightweight full-frame lenses from Carl Zeiss and Sony’s premier G-series.Direct access interface for fast, intuitive shooting controlQuick Navi Pro displays all major shooting options on the LCD screen so you can rapidly confirm settings and make adjustments as desired without searching through dedicated menus. When fleeting shooting opportunities arise, you’ll be able to respond swiftly with just the right settings.High contrast 2.4M dot OLED EVF for eye-level framingView every scene in rich detail with the XGA OLED Tru-Finder, which features OLED improvements and the same 3-lens optical system used in the flagship α99. The viewfinder faithfully displays what will appear in your recording, including the effects of your camera settings, so you can accurately monitor the results. You’ll enjoy rich tonal gradations and 3 times the contrast of the α99. High-end features like 100% frame coverage and a wide viewing angle are also provided.3.0" 1.23M dot LCD tilts for high and low angle framingILCE-7K/Ba7 (Alpha 7) Interchangeable Lens CameraNo other full frame, interchangeable-lens camera is this light or this portable. 24.3 MP of rich detail. A true-to-life 2.4 million dot OLED viewfinder. Wi-Fi ® sharing and an expandable shoe system. It’s all the full-frame performance you ever wanted in a compact size that will change your perspective entirely.The tiltable 3.0” (1,229k dots) Xtra Fine™ LCD Display makes it easy to photograph over crowds or low to capture pets eye to eye by swinging up approx. 84° and down approx. 45°. Easily scroll through menus and preview life thanks to WhiteMagic™ technology that dramatically increases visibility in bright daylight. The large display delivers brilliant-quality still images and movies while enabling easy focusing operation.Simple connectivity to smartphones via Wi-Fi® or NFCConnectivity with smartphones for One-touch sharing/One-touch remote has been simplified with Wi-Fi®/NFC control. In addition to Wi-Fi support for connecting to smartphones, the α7 also supports NFC (near field communication) providing “one touch connection” convenience when transferring images to Android™ smartphones and tablets. Users need only touch devices to connect; no complex set-up is required. Moreover, when using Smart Remote Control — a feature that allows shutter release to be controlled by a smartphone — connection to the smartphone can be established by simply touching compatible devices.New BIONZ X image processing engineSony proudly introduces the new BIONZ X image processing engine, which faithfully reproduces textures and details in real time, as seen by the naked eye, via extra high-speed processing capabilities. Together with front-end LSI (large scale integration) that accelerates processing in the earliest stages, it enables more natural details, more realistic images, richer tonal gradations and lower noise whether you shoot still images or movies.Full HD movie at 24p/60i/60p w/uncompressed HDMI outputCapture Full 1920 x 1080 HD uncompressed clean-screen video files to external recording devices via an HDMI® connection in 60p and 60i frame-rates. Selectable in-camera A VCHD™ codec frames rates include super-smooth 60p, standard 60i or cinematic 24p. MP4 codec is also available for smaller files for easier upload to the web.Up to 5 fps shooting to capture the decisive momentWhen your subject is moving fast, you can capture the decisive moment with clarity and precision by shooting at speeds up to 5 frames per second. New faster, more accurate AF tracking, made possible by Fast Hybrid AF, uses powerful predictive algorithms and subject recognition technology to track every move with greater speed and precision. PlayMemories™ Camera Apps allows feature upgradesPersonalize your camera by adding new features of your choice with PlayMemories Camera Apps. Find apps to fit your shooting style from portraits, detailed close-ups, sports, time lapse, motion shot and much more. Use apps that shoot, share and save photos using Wi-Fi that make it easy to control and view your camera from smartphone, and post photos directly to Facebook or backup images to the cloud without connecting to a computer.114K Still image output by HDMI8 or Wi-Fi for viewing on 4K TVsEnjoy Ultra High Definition slide shows directly from the camera to a compatible 4K television. The α7 converts images for optimized 4K image size playback (8MP). Enjoy expressive rich colors and amazing detail like never before. Images can be viewed via an optional HDMI or WiFi.Vertical Grip CapableEnjoy long hours of comfortable operation in the vertical orientation with this sure vertical grip, which can hold two batteries for longer shooting and features dust and moisture protection.Mount AdaptorsBoth of these 35mm full-frame compatible adaptors let you mount the α7R with any A-mount lens. The LA-EA4 additionally features a built-in AF motor, aperture-drive mechanism and Translucent Mirror Technology to enable continuous phase-detection AF. Both adaptors also feature a tripod hole that allows mounting of a tripod to support large A-mount lenses.Specifications1. Among interchangeable-lens cameras with an full frame sensor as of October 20132. Records in up to 29 minute segments.3. 99 points when an APS-C lens compatible with Fast Hybrid AF is mounted.7. Actual performance varies based on settings, environmental conditions, and usage. Battery capacity decreases over time and use.8. Requires compatible BRA VIA HDTV and cable sold separately.9. Auto Focus function available with Sony E-Mount lenses and Sony A-mount SSM and SAM series lenses when using LA-EA2/EA4 lens adaptor.。

基于Web日志和聚类的协同过滤推荐算法

・
４・
ＣｏｐｔｒＥｒｏ．２１ｍｕｅａＮ１０１
基于Ｗｅ日志和类的协同过滤推荐算法ｂ聚
张校慧 ’ ，魏增辉（．黄河水利职业技术学院信息工程系，河南开封４５０；２１７０４．黄河水利职业技术学院信息工程系）
摘要：协同过滤推荐算法是目前应用最为成功的一种电子商务推荐方法，协同过滤算法也存在数据稀疏性和缺了推荐算法的效率和准确性。针对以上问题，出了引入Ｗｅ提ｂ日志分析的方法。同时利用用户聚类等相关技术，不仅解决了数据稀疏的问题也提高了推荐的准确性。
ａｆｃｔｅｆｃｅｃａｄｃｕａｙｆｅｏｆｅｔｈｅｉｎｙｎａｃｒｃｏｒｃｍｍｅｄｔｏａｇｒｔｍ．Ｆｏｔｅｂｏｅｒｂｌｍｓｉｎａｉｎｌｏｉｈｒｈａｖｐｏｅ，ｗｅｒｐｓｔｅｐｏｏｅｈｍｅｈｄｆｎｒｄｉｇｔｏｏｉｔｏｕｃｎ
ｂｔｏｌｂｒｔｅｉｌｅｉｇｌｏｉｈｕｃｌｏａｉｆｔｒｎａｇｒｍａｓｈｓｏｐｏｌｍｓｕｈｓａａｐｒｅｅｓｎｌｃｏｉｄｖｄａｉｎｔｅｅａｖｔｌｏａｓｍｅｒｂｅｓｃａｄｔｓａｓｎｓａｄａｋｆｎｉｉｕｔｏ，ｈｓｐｒｂｅｏｌｍｓ
１２用户识别。
由于代理服务器和防火墙的存在，不能仅靠ＩＰ地址来识

基于Web日志挖掘的用户模式识别研究

点的具体信息。一般来说．完整的Ｗｅｂ服务器日志包括以下信息：来访客户端的ＩＰ地址、访问发生时间、受访页面的ＵＲＬ、ＨｑＴＰ头信息、Ｗｅｂ服务器对于该请求返回的状态信息、该请求的引用ＵＲＬ、客户端的浏览类
行为习惯，从而可以改进网站的设计．一方面可以方便
用户使用，增强个性化服务．另一方面通过优化了网站
行为模式．然后根据发现的用户行为模式改进站点的设计和开展个性化的信息服务问
设计，也可以提高服务器的性能。本文以ＢＩＲＣＨ算法为基础，做出优化改进。并将改进的算法用于Ｗｅｂ用
研究与开发
～．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．，，． —．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．．。．．．．．．．．．
１Ｗｅｂ日志挖掘
Ｗｅｂ服务器日志能够记录Ｗｅｂ服务器捕获的所有请求．并以标准的Ｗ３Ｃ规范记录下客户端访问本站
行聚类．是一种应用于大规模数据集的聚类算法算法
主要分为两个阶段．第一阶段是随着数据对象一个一个的加入，自动形成聚集特征树．并将数据对象放人那个离它最近的叶子节点（簇）中。在放人数据对象后．若

基于模糊推理的web客户需求协同过滤推荐算法

第３０卷第１期２１年１０１月
情
报
杂
志
Ｖｏ＿ＯＮ０ｌ３．１
ＪＯＵＲＮＡＬＮＴＬＧＥＮＯＦＩＥＬＩＣＥ
Ｊｎ２１ａ．０１
基于模糊推理的ｗｂｅ客户需求协同过滤推荐算法水
赵宏霞王新海杨皎平
（．宁工程技术大学营销管理学院１辽葫芦岛１５０；２１５
２辽宁工程技术大学工商管理学院．摘要
葫芦岛１５０）２１５
文章提出了基于模糊推理的协同过滤推荐算法，在该算法中，用模糊集合之间特征系数来代替传统算法的
相似系数，用模糊推理来代替加权平均预测。最后文章通过实验证明该算法具有较好的精确度，为以后研究推荐算
ｂｈｕｚｙｄｄｕｔｎ．Ｆｎｌｙ，ｅｐｒｍｅｔｈｗａｈｇｒｔｍａｅｔｒａｃｒｃｙｔｅｆｚｅｃｏｉｉａｌｘｉｅｎｓｓｏｔｔｔｅａｏｈｈｓａｂｔｃｕａｙ－ｗｈｃｒｖｄｓａｎｗｙｆｒｆｔｒｅｏｈｌｉｅｉｈｐｏｉｅｅｗａｏｕｅｒｃｍ— ｕ
ＤｅｎｄＢａｅｎＦｕｚｙＤｅｃｉｎｍａｓｄｏｚｄｕｔｏ
ＺＨＡＯｎｘａＨｏｇｉ
ＷＡＮＧｎａＸｉｈｉ
ＹＡＮＧｉｏｎｇＪａｐｉ
（ｃｏｌｆｒｅｎｎｇｍｎ－ｉｎｎｅｈｉｌｎｖｒｉ，ｕｕａ２１５１ｈｏｋｔｇＭａａｅｅｔＬａｉｇＴｃｎａＵｉｅｓｙＨｌｄｏ１５０；ＳｏＭａｉｏｃｔ

【计算机科学】_方法模式_期刊发文热词逐年推荐_20140728

推荐指数 4 4 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2009年序号 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
科研热词模式识别模式匹配线性四元树模式分类数据挖掘图像表示图像复杂度聚类相似度特征提取文本分类数据流人脸识别高维数据频繁模式需求获取需求工程模式需求工程集合冗余问题理解问答系统邻域模型遵守型逻辑马尔可夫决策过程逆布局问题运行时测试边缘强度路由超核函数资源发现贝叶斯方法调度语义距离语义倾向度访问模式挖掘访问模式设计立方体计算独立模型规则优化自组织特征映射图自同态自动化测试能量高效网络编码网格网构软件缓冲区置换策略维数约简结果模式粗糙集粗糙度粒子群优化类型体系空间离群点离群点挖掘神经网络石刻艺术石刻图像短语结构知网真实感建模相关性相似度量用例驱动状态转换系统特征降维特征抽取特征子集选择特定领域特定域软件体系结构混合系统深网浏览预测流形学习正确性正交变换欺骗型模糊集模糊隶属函数模板匹配模板模式集成模式复用模型驱动架构模型驱动模型转换模型误差概念格梯形子模式档案图像核方法核子类凸包样本选择样本选择标尺查询实例查询优化查询机器学习最小化最大流显著性测试映射

融合注意力机制的知识图谱推荐模型

第 22卷第 3期2023年 3月Vol.22 No.3Mar.2023软件导刊Software Guide融合注意力机制的知识图谱推荐模型李君，倪晓军（南京邮电大学计算机学院、软件学院、网络空间安全学院，江苏南京 210000）摘要：知识图谱在推荐领域得到了广泛关注，通常被用来作为辅助信息嵌入到推荐模型中，以更好地缓解传统推荐算法数据稀疏和冷启动问题。

但是部分模型的输入向量较为稀疏，也没有充分挖掘用户与物品之间的特征交互，进而影响模型性能。

因此，提出一种基于 FGCNN 与 MKR 的融合注意力机制的知识图谱推荐模型（BAKR）。

首先，利用 FGCNN 的 Feature Generation 模块提取用户和物品的特征向量；其次，使用知识图谱获取实体之间的依赖关系，将隐含的辅助信息嵌入到模型中，再通过注意力机制重新分配用户的偏好权重值，进而更好地协助推荐任务，提高推荐性能；最后，在 MovieLens-1M 数据集和Book-Crossing数据集上进行仿真实验。

结果证明，该模型可显著提升推荐的准确率。

关键词：推荐模型；知识图谱；注意力机制DOI：10.11907/rjdk.222429开放科学（资源服务）标识码（OSID）：中图分类号：TP391.3 文献标识码：A文章编号：1672-7800（2023）003-0118-07Knowledge Graph Recommendation Model Integrating Attention MechanismLI Jun， NI Xiao-jun（School of Computer Science， Nanjing University of Posts and Telecommunications， Nanjing 210000， China）Abstract：Knowledge graph has received extensive attention in the field of recommendation， and it is often used as auxiliary information to be embedded in recommendation models to better alleviate the data sparsity and cold start problems of traditional recommendation algorithms. However， the input vector of some models is relatively sparse， and the feature interaction between users and items is not fully explored， which makes the representation between users and items less accurate and affects the performance of the model. Therefore， a knowledge graph recom⁃mendation model （BAKR） based on the fusion attention mechanism of FGCNN and MKR is proposed. First， FGCNN′s Feature Generation module is used to extract feature vectors of users and items. Secondly， the knowledge graph is used to obtain the dependencies between enti⁃ties， embed the implied auxiliary information into the model， and then redistribute the user′s preference weight value through the attention mechanism to better assist the recommendation task and improve the recommendation performance. Finally， simulation experiments are car⁃ried out on the MovieLens-1M and Book-Crossing dataset， and the experimental results show that the accuracy of the model for the recommen⁃dation effect is significantly improved.Key Words：recommendation system； knowledge graph； attention mechanism0 引言随着信息化社会的发展，其产生的数据量进一步爆炸式增长［1］，人们所面临的问题不再是信息匮乏，而是如何从海量数据中获取用户需要的信息（如商品、电影、书籍等）。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

S. Zhang and R. Jarvis (Eds.): AI 2005, LNAI 3809, pp. 415 – 424, 2005.© Springer-Verlag Berlin Heidelberg 2005Towards User Profiling for Web RecommendationGuandong Xu 1, Yanchun Zhang 1, and Xiaofang Zhou 21School of Computer Science and Mathematics,Victoria University, PO Box 14428, VIC 8001, Australia {xu,yzhang}@.au 2 School of Information Technology & Electrical Engineering,University of Queensland, Brisbane QLD 4072, Australia zxf@.au Abstract. Collaborative recommendation is one of widely used recommenda-tion systems, which recommend items to visitor on a basis of referring other’spreference that is similar to current user. User profiling technique upon Webtransaction data is able to capture such informative knowledge of user task orinterest. With the discovered usage pattern information, it is likely to recom-mend Web users more preferred content or customize the Web presentation tovisitors via collaborative recommendation. In addition, it is helpful to identifythe underlying relationships among Web users, items as well as latent tasks dur-ing Web mining period. In this paper, we propose a Web recommendationframework based on user profiling technique. In this approach, we employProbabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence ac-tivities and develop a modified k-means clustering algorithm to build user pro-files as the representatives of usage patterns. Moreover, the hidden task modelis derived by characterizing the meaningful latent factor space. With the dis-covered user profiles, we then choose the most matched profile, which pos-sesses the closely similar preference to current user and make collaborative rec-ommendation based on the corresponding page weights appeared in the selecteduser profile. The preliminary experimental results performed on real world datasets show that the proposed approach is capable of making recommendation ac-curately and efficiently.1 IntroductionIn recent years, the massive influx of information onto World Wide Web has facili-tated user, not only retrieving information, but also discovering knowledge. However, Web users usually suffer from the information overload problem due to the fact of significantly increasing and rapidly expanding growth in amount of information on the Web. One approach addressed to the information overload is the recommendation system, which aims to help users locate more needed or preferred information. Typi-cally, Web recommendation system focuses on the processes of identifying Web users or objects, collecting information with respect to users’ preference or interests as well as adapting its service to satisfy the users’ needs. In short, Web recommendation can be used to provide better quality service and application of Web to users during their browsing period.416 G. Xu, Y. Zhang, and X. ZhouTo-date, the problem of recommending appropriate items from data repository to users has been extensively studied and two paradigms named content-based filtering and collaborative filtering systems have emerged. Content-based filtering systems such as WebWatcher [8], try to recommend items that are similar to those visited by a given user in the past, whereas collaborative filtering systems intend to identify user category whose taste or preference is close enough to the given user and recommend items that are historically rated by them [6]. The former often utilizes traditional in-formation filtering and information retrieval methods, while the latter employs user correlation or nearest-neighbor algorithm. Especially, the collaborative filtering tech-nique has been gradually adopted in the context of Web recommendation applications and has achieved great success as well [5, 9] in recent years.Web usage mining technique, which exploits data mining methods, such as k-Nearest Neighbor algorithm (kNN) [5], Web user or page clustering [4, 11, 12], asso-ciation rule mining [1] and sequential pattern mining technique [2], to create model based on the analysis of usage data, has been used in building Web recommendation system recently. With the usage pattern knowledge discovered in Web usage mining process, Web recommendation system can generate usage-based user profiles as the representatives of the aggregate user behaviors for collaborative recommendation.As a result, a variety of research communities have addressed this topic and Web usage mining is becoming a potential approach for Web recommendation. To reveal the un-derlying relationships among Web objects, Latent Semantic Analysis (LSA) tech-nique has been incorporated into Web usage mining process. Some LSA-based algo-rithms are developed for Web recommendation [13, 14].In this paper, we propose a Web recommendation framework based on user profil-ing technique. The usage pattern knowledge, in the form of user profile derived from Web usage mining, is combined into Web recommendation system to improve the ef-ficiency of recommendation by predicting user-preferred content and customizing the presentation. During pattern discovery stage, probabilistic inference method based on Probabilistic Latent Semantic Analysis (PLSA) model, a variant of LSA, is exploited to model the underlying relationships among the co-occurrence activities and identify the latent task model in terms of latent semantic factor. Through Web user session clustering, we create user profiles as the representatives of usage patterns. To make Web recommendation, we match the current active user activity against such discov-ered patterns to find the most like-minded user category, in turn, determine the poten-tially interested pages as recommendation set based on the visited probabilities exhib-ited by such type of users. We demonstrate the effectiveness of the proposed technique through experiments performed on real world data sets. The evaluation re-sults show that the usage-based approach is more applicable in comparison with some traditional techniques.The rest of the paper is organized as follows. In section 2, we introduce the Web usage mining process, especially we focus on how to model Web co-occurrence ac-tivities based on PLSA. We present the algorithms for discovering usage-based user profiles and latent factors in section 3. In section 4, we propose the Web recommen-dation framework upon user profiling approach. We conduct preliminary experiments on two real world datasets, implement some comparisons against the traditional work in section 5, conclude and outline future work in section 6.Towards User Profiling for Web Recommendation 417 2 Usage-Based User Profiling with PLSAAs discussed above, Web recommendation is the ultimate goal of Web usage mining conducted on the data collected at the Web log servers of a specific Web site. This whole procedure usually consists of three steps, i.e. data collection and preprocessing, pattern mining as well as knowledge application. Figure 1 depicts the whole process.Fig. 1. The process of Web Mining and Web Recommendation2.1 Usage Data RepresentationPrior to introducing user profiling technique, we briefly discuss the issue with respect to construction of usage data. In general, the exhibited user access interests may be re-flected by the varying degrees of visits on different Web pages during one session. Thus, we can represent a user session as a weighted page vector visited by the user during a period. In this paper, we use the following notations to model the co-occurrence activities of Web users and pages:• {}12,,m S s s s = : a set of m user sessions.• {}12,,n P p p p = : a set of n Web pages.• For each user, the navigational session is represented as a sequence of visited pages with corresponding weights: {},1,2,,,i i i i n s a a a = , where ,i j a denotes the weight for page j p visited in i s user session. The corresponding weight is usually determined by the number of hit or the amount time spent on the spe-cific page. Here, we use both of them to construct usage data from two real world data sets.• {},m n i j SP a ×=: the ultimate usage data in the form of weight matrix with di-mensionality of m n ×.418 G. Xu, Y. Zhang, and X. Zhou2.2 PLSA ModelThe PLSA model is based on a statistic model called aspect model, which can be util-ized to identify the hidden semantic relationships among general co-occurrence activi-ties. Similarly, we can conceptually view the user sessions over Web pages space as co-occurrence activities in the context of Web usage mining to discover the latent us-age pattern. For the given aspect model, suppose that there is a latent factor space {}12,,k Z z z z = and each co-occurrence observation data ,i j s p <>is associated withthe factor k z Z ∈ by varying degree to k z .Based on these assumptions and Bayesian rule, we calculate the probability of an observed pair ,i j s p <> by adopting the latent factor variable z k as:(,)()(|)(|)k i j k i k j k z Z P s p P z P s z P P z ∈=••∑ (1)Following the likelihood principle, the total likelihood is determined as∑∈∈•=P p S s j i j i i j i p s P p s m L ,),(log ),( (2)where (,)i j m s p is the element of the session-page matrix corresponding to sessioni s and page j p .In order to maximize the total likelihood, we make use of Expectation Maximiza-tion (EM) algorithm to perform maximum likelihood estimation of ()k P z , (|)i k P s z , (|)j k P p z in latent variable model [3]. The executing of E-step and M-step is repeat-ing until i L is converging to a local optimal limit, which means the estimated results can represent the final probabilities of observation data. It is easily found that the computational complexity of this algorithm is ()O mnk , where m is the number of user session, n is the number of page, and k is the number of factors.3 Discovery of Latent Factors and Usage-Based User ProfilesAs we discussed in section 2, the estimated probabilities quantitatively measure the underlying relationships among Web users, pages as well as latent factors (i.e. tasks). Therefore, it is reasonable to identify the latent factors and discover the related usage-based access patterns upon probability inference process. In this section, we propose how to derive the aforementioned usage information.3.1 Characterizing Latent FactorFirst, we discuss how to capture the latent factor associated with user navigational be-havior. This aim is to be achieved by characterizing the “dominant” pages that con-tribute significantly to the factor. Note that (|)j k p p z represents the conditional oc-currence probability over the page space corresponding to a specific factor, whereas (|)k j p z p reflects the conditional probability distribution over the factor space corre-Towards User Profiling for Web Recommendation 419 sponding to a specific page. Thus, we may choose the pages whose conditional prob-abilities (|)k j p z p and (|)j k p p z are both greater than a predefined threshold to form “dominant” page set. Exploring the contents of these pages would result in character-izing the semantic meaning of each factor. In section 4, we will present various ex-amples of latent factors as well as those “dominant” pages derived from two real data sets.3.2 Building Usage-Based User ProfilesNote that the set of (|)k i P z s is conceptually representing the probability distributionover the latent factor space for a specific user session i s , we, thus, construct the ses-sion-factor matrix based on the calculated probability estimates, to reflect the rela-tionship between Web users and latent factors, which is expressed as follows:',1,2,(,,...,)i i i i k s b b b = (3)where ,i s b is the occurrence probability of session i s on factor s z . In this way, the distance between two session vectors may reflect the exhibited navigational behavior similarity. We, therefore, define their similarity by applying well-known cosine simi-larity as:()''''''22(,),()i j i ji j sim s s s s s s =• (4) where ()'',,1,i j k i m j m m s s b b ==∑, '2i s,'2js =With the page similarity measurement (4), we propose a modified k-means cluster-ing algorithm [13] to partition user sessions into corresponding clusters. As each user session is represented as a weighted page vector, it is reasonable to derive the centroid of cluster obtained as the usage pattern in the form of user profile. In this work, we compute the mean vector to represent the centroid. The algorithm for clustering user sessions and constructing user profiles is as follows:Algorithm 1. Building User ProfilesInput: the set of conditional probabilities (|)k i P z sOutput: A set of user session clusters 12 {,,, }P SCL SCL SCL SCL = and a set of user profiles 12{,,,}p PF PF PF PF =1. For all user sessions, employ the modified k-means clustering algorithm and out-put a set of usage-based session clusters {}t SCL SCL =.2. for each user session cluster, calculate the centroid of cluster as'1/i i t t t s SCL Cid SCL s ∈=•∑ (5) where t SCL is the number of sessions in the cluster.420 G. Xu, Y. Zhang, and X. Zhou3. Treat the centroid of generated cluster as the aggregate user profile, and sort the normalized weights in a descending order to reflect the relative “significance” contributed by the corresponding pages within the selected user profile, i.e.{}2112,,,,,,t t tt t t t n n PF p w p w p w =<><><> (6) where ,1/i t tj t i j s SCL w SCL a ∈=•∑, 12t t t n w w w >>> , and t j p P ∈4. Output {}t PF PF =. 4 Using PLSA for Web PersonalizationGenerally, we recommend Web items to users in customized or preferred style based on analysis of their interests exhibited by individual or groups of users. In this work, we adopt the model-based technique in our Web recommendation framework. We consider the usage-based user profiles generated in section 3.2 as the aggregated rep-resentatives of common navigational behaviors exhibited by all individuals in same particular user category. For a newly coming active user session, we utilize cosine function to measure the similarity between it and discovered user profile. We, then, choose the closest profile, which shares the highest similarity with the current user session, as the matched pattern to current user. Finally, we generate the top-N recom-mendation pages based on the historically visited probabilities of pages by other users in the selected profile. The detailed procedure is as follows:Algorithm 2. Web Recommendation Based on user profilingInput: An active user session and a set of user profilesOutput: The top-N recommendation pages1. The active session and the profiles are to be simplified as n-dimensional weight vectors a s ,p s instead of page-weight pair vector over the page space that is generated from algorithm 3 within a site, i.e. 12[,,,]p p p p n s w w w = , where pi w is the significance weight contributed by page i p in this profile, similarly12[,,]a a a a n s w w w = , where 1a i w =, if page i p is already accessed, and other-wise 0a i w =.2. Measure the similarities between the active session and all derived usage pro-files, and choose the maximum one out of the calculated similarities as the most matched pattern:22(,)max((,))max(())mat j a p a p a p a p j j sim s s sim s s s s s s ==i (7)3. Incorporate the selected profile p mat s with the active session a s , then calculate the recommendation score ()i rs p for each page i p :Towards User Profiling for Web Recommendation 421()mat i i rs p w =, mat mati p w s ∈ (8)Thus, each page in the profile will be assigned a recommendation score be-tween 0 and 1. Note that the recommendation score will be 0 if the page is al-ready visited in the current session.4. Sort the calculated recommendation scores in step 3 in a descending order, i.e.12(,,,)mat mat mat n rs w w w = , and select the N pages with the highest recom-mendation scores to construct the top-N recommendation set:1(){|()(),1,2,,}mat mat mat mat j j j j REC N p rs p rs p j N p P +=>=∈ (9)5 Experiments and EvaluationsIn order to evaluate the effectiveness of the proposed method based on PLSA model and explore the discovered latent semantic factor, we have conducted preliminary ex-periments on two real world data sets.5.1 Data SetsThe first data set we used is downloaded from KDDCUP Web site (/KDDCUP/). After data preparation, we have setup an evalua-tion data set including 9308 user sessions and 69 pages, where every session consists of 11.88 pages in average. We refer this data set to “KDDCUP data”. In this data set, the number of Web page hits by the given user determines the element in session-page matrix associated with the specific page in the given session.The second data set is from a academic Website log files[10]. The data is based on a 2-week Web log file during April of 2002. After data preprocessing stage, the fil-tered data contains 13745 sessions and 683 pages. The entries in the usage data corre-spond to the amount of time (in seconds) spent on pages during a given session. For convenience, we refer this data as “CTI data”.5.2 Latent Factors Based on PLSA ModelWe conduct experiments on the two data sets to extract the latent factors via identify-ing “dominant” page set. Here, we present the experimental results of the derived latent factors from two real data sets based on PLSA model respectively. Table 1 il-lustrates one example out of the derived factors extracted from the KDDCUP data set as well as the “dominant” page set, whose probabilities are over the predefined threshold, whereas Table 2 presents the example out of those from CTI data set. From these tables, it is easily concluded that the factor #6 in KDDCUP data set reflects the scenario involving in online shopping process, whereas the factor #13 stands for ac-tivity of searching postgraduate program information.422 G. Xu, Y. Zhang, and X. ZhouTable 1. Example of laten factor and its associated pages from KDDCUP FactorPage # Content Pgae # Content 27 main/login2 50 account/past_orders 32 main/registration 52 account/credit_info 42 account/your_account 60 checkout/thankyou 44 checkout/expresCheckout64 account/create_credit 45 checout/confirm_order 65 main/welcome# 6onlineshoppingprocess 47 account/address 66 account/edit_credit Table 2. Example of laten factor and its associated pages from CTIFactorPage # Content Pgae # Content 386 /News 588 /Prog/2002/Gradect2002 575 /Programs 590 /Prog/2002/Gradis2002 586 /Prog/2002/Gradcs2002 591 /Prog/2002/Gradmis2002# 13Postgrad-program 587 /Prog/2002/Gradds2002 592 /Prog/2002/Gradse20025.3 Evaluation Metric of User Session Clusters and Web Recommendation In order to evaluate the quality of clusters derived from PLSA-based approach, we adopt one specific metric, named the Weighted Average Visit Percentage (WAVP) [8]. This evaluation method is based on assessing each user profile individually according to the likelihood that a user session, which contains any pages in the session cluster, will include the rest pages in the cluster during the same session. Suppose T is one of session set within the evaluation set, and for s specific cluster C , let T c denote a subset of T whose elements contain at least one page from C, the WAVP is computed as:(,)c t T p PFc t C WAVP wt p pf T ∈∈⎛⎞⎛⎞•=⎜⎟⎜⎟⎝⎠⎝⎠∑∑On the other hand, we exploit a metric called hit precision [7] to measure the preci-sion in the context of top-N recommendation. Given a user session in the test set, we extract the first j pages as an active user session to generate a top-N recommendation set via the procedure described in section 4. Since the recommendation set is in de-scending order, we then obtain the rank of 1j + page in the sorted recommendation list. Furthermore, for each rank 0r >, we sum the number of test data that exactly rank the r th as ()Nb r . Let 1()()ri S r Nb i ==∑, and ()/hitp S N T =, where T repre-sents the number of testing data in the whole test set. Thus, hitp stands for the hit pre-cision of Web recommendation.In order to compare our approach with other existing methods, we implement a baseline method that is based on the clustering technique [11]. This method is toTowards User Profiling for Web Recommendation 423Fig. 2. WAVP comparison for CTI Fig. 3.Hitp comparison for CTI generate usage-based session clusters by performing k-means clustering process on us-age data explicitly. Then, the cluster centroids are treated as the aggregated access patterns.Figures 2 and 3 depict the comparison results of WAVP and hitp coefficient per-formed on CTI dataset using the two methods discussed above respectively. The re-sults demonstrate that the proposed PLSA-based technique consistently overweighs standard clustering-based algorithm in terms of WAVP and hit precision parameter. In this scenario, it can be concluded that our approach is capable of making Web rec-ommendation more accurately and effectively against the conventional method. In addition to recommendation, this approach is able to identify the hidden factors why such user sessions or Web pages are grouped together in same category.6 Conclusion and Future WorkIn this paper, we have developed a Web recommendation framework incorporating user profiling technique based on PLSA model. With the proposed probabilistic method, we can measure the co-occurrence activities (i.e. user sessions) in terms of probability estimations to capture the underlying relationships among Web users, pages as well as latent tasks. Analysis of the estimated probabilities leads to build up usage-based user profiles and identify the hidden factors associated with the corre-sponding interests or patterns as well. The discovered usage patterns in the forms of user profiles is used to make collaborative recommendation, in turn, lead to improve the precision and effectiveness of Web recommendation. We have demonstrated the efficiency of our technique through preliminary experiments performed on the real world datasets and comparisons with other existing work.Our future work will focus on the following issues: we intend to identify the primitive task of active user and incorporate Web page categories to predict user potentially visited pages, and implement more experiments to validate the scalability of our approach.424 G. Xu, Y. Zhang, and X. ZhouReferences1 R. Agarwal, C. Aggarwal and V. Prasad, A Tree Projection Algorithm for Generation ofFrequent Itemsets, Journal of Parallel and Distributed Computing, 61 (1999), pp. 350-371.2 R. Agrawal and R. Srikant, Mining Sequential Patterns, in P. S. Y. a. A. S. P. Chen, ed.,Proceedings of the International Conference on Data Engineering (ICDE), IEEE Com-puter Society Press, Taipei, Taiwan, 1995, pp. 3-14.3 A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete datavia the EM algorithm, Journal Royal Statist. Soc. B, 39 (1977), pp. 1-38.4 E. Han, G. Karypis, V. Kumar and B. Mobasher, Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results, IEEE Data Engineering Bulletin, 21 (1998), pp. 15-22.5 J. Herlocker, J. KONSTAN, A. BORCHERS and J. RIEDL, An Algorithmic Frameworkfor Performing Collaborative Filtering, Proceedings of the 22nd ACM Conference on Re-searchand Development in Information Retrieval (SIGIR'99), Berkeley, CA., 1999.6 J. L. Herlocker, J. A. Konstan, L. G. Terveen and J. T. Riedl, Evaluating collaborative fil-tering recommender systems, ACM Transactions on Information Systems (TOIS), 22 (2004), pp. 5 - 53.7 X. Jin, Y. Zhou and B. Mobasher, A Unified Approach to Personalization Based on Prob-abilistic Latent Semantic Models of Web Usage and Content, Proceedings of the AAAI 2004 Workshop on Semantic Web Personalization (SWP'04), San Jose, 2004.8 T. Joachims, D. Freitag and T. Mitchell, Webwatcher: A tour guide for the world wideweb, The 15th International Joint Conference on Artificial Intelligence (ICJAI'97), Na-goya, Japan, 1997, pp. 770-777.9 J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon and J. Riedl, Grouplens: Apply-ing Collaborative Filtering to Usenet News, Communications of the ACM, 40 (1997), pp.77-87.10 B. Mobasher, Web Usage Mining and Personalization, in M. P. Singh, ed., PracticalHandbook of Internet Computing, CRC Press, 2004.11 B. Mobasher, H. Dai, M. Nakagawa and T. Luo, Discovery and Evaluation of AggregateUsage Profiles for Web Personalization, Data Mining and Knowledge Discovery, 6 (2002), pp. 61-82.12 M. Perkowitz and O. Etzioni, Adaptive Web Sites: Automatically Synthesizing WebPages., Proceedings of the 15th National Conference on Artificial Intelligence, AAAI, Madison, WI, 1998, pp. 727-732.13 G. Xu, Y. Zhang and X. Zhou, A Latent Usage Approach for Clustering Web Transactionand Building User Profile, The First International Conference on Advanced Data Mining and Applications (ADMA 2005), Springer, Wuhan, china, 2005, pp. 31-42.14 G. Xu, Y. Zhang and X. Zhou, Using Probabilistic Semantic Latent Analysis for WebPage Grouping, 15th International Workshop on Research Issues on Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'2005), Tokyo, Japan, 2005.。