Migratory Logistic Regression for Learning Concept Drift Between Two Data Sets with Applica

合集下载

Logistic回归(逻辑回归)总结

3.3 梯度下降法求 J ( ) 的最小值
求 J ( ) 的最小值可以使用梯度下降法，根据梯度下降法可得的更新过程：
j : j -
式中为学习步长，下面来求偏导：
J ( ), ( j 0 n) j
（11）
1 m 1 1 J ( ) y ( i ) h (x (i) ) (1 y ( i ) ) h (x (i) ) (i) (i) j m i 1 h (x ) j 1 h (x ) j 1 m (i ) 1 1 (1 y ( i ) ) g ( T x (i) ) y T (i) T (i) m i 1 g ( x ) 1 g ( x ) j 1 m (i ) 1 1 T (i) (1 y ( i ) ) g ( T x (i) ) 1 g ( T x (i) ) x y T (i) T (i) m i 1 g ( x ) 1 g ( x ) j
i 1
for 语句循环 m 次，所以根本没有完全的实现 vectorization，不像《机器学习实战》的代码中一条语句就可以完成的更新。下面说明一下我理解《机器学习实战》中代码实现的 vectorization 过程。约定训练数据的矩阵形式如下， x 的每一行为一条训练样本，而每一列为不同的特称取值：
（5）
（6）实际上这里的 Cost 函数和 J 函数是基于最大似然估计推导得到的。下面详细说明推导的过程。（4）式综合起来可以写成：
P ( y | x; ) (h ( x)) y (1- h ( x))1- y
取似然函数为：
（7）
L( ) P ( y (i) | x (i) ; )

logistic regression法

logistic regression法
（原创实用版）
目录
1.线性回归概述
2.Logistic 回归法的原理
3.Logistic 回归法的应用
4.Logistic 回归法的优缺点
正文
线性回归是一种常见的统计分析方法，主要用于研究因变量和自变量之间的关系。

在线性回归中，因变量通常是连续的，而自变量可以是连续的或离散的。

然而，当因变量为二分类或多分类时，线性回归就不再适用。

这时，Logistic 回归法就被引入了。

Logistic 回归法是一种用于解决分类问题的统计方法，其原理是基于逻辑斯蒂函数。

逻辑斯蒂函数是一种 S 型函数，其取值范围在 0 到 1 之间，可以用来表示一个事件发生的概率。

在 Logistic 回归法中，我们通过将自变量输入逻辑斯蒂函数，得到一个概率值，然后根据这个概率值来判断因变量所属的类别。

Logistic 回归法广泛应用于二分类和多分类问题中，例如信用风险评估、疾病预测、市场营销等。

在我国，Logistic 回归法也被广泛应用于各种领域，如金融、医疗、教育等。

Logistic 回归法虽然具有很多优点，但也存在一些缺点。

首先，Logistic 回归法对于自变量过多或者数据量过小的情况不太适用，因为这样容易导致过拟合。

其次，Logistic 回归法的计算过程比较复杂，需要用到特种数学知识，对计算资源的要求也比较高。

总的来说，Logistic 回归法是一种重要的分类方法，具有广泛的应
用前景。

逻辑回归

Comparing the Linear Probability and Logit Models 比较线性回归和逻辑回归模型
The logistic regression model is the non-lineion 逻辑回归模型只是对线性回归的非线性转换
Y=0 X
A06-4
Why Use Logistic Regression? 为什么使用逻辑回归？
In using ordinary linear regression, if you move far enough along the X-axis, the predicted values will become greater than 1 and less than 0. 当使用一般线性回归，如果沿着X轴移动太远的话，预测的数值就会大于1或者小于0
Logistic Regression 逻辑回归
Learning Objectives 学习目标
Understand the application of Logistic Regression 了解逻辑回归的应用场合
Understand the three types of Logistic Regression 了解逻辑回归的三种类型 1. Binary Response Variables 二进制响应变量 2. Ordinal Response Variables 有序响应变量 3. Nominal Response Variables 名义响应变量
Linear regression techniques are used with continuous Output variables. 线性回归技术用于连续型输出变量

logistic regression算法

logistic regression算法
Logistic regression (逻辑回归) 是一种用于二分类问题的统计学
习方法。

它基于一个由逻辑函数或Sigmoid函数表示的非线性
回归模型。

逻辑回归的目标是根据输入的特征，将样本分为两个离散的类别。

与线性回归不同，逻辑回归在结果上给出了0
和1之间的概率估计。

逻辑回归的实现步骤如下：
1. 收集训练数据集，该数据集应包含输入特征和对应的类别标签。

2. 准备数据，例如通过对特征进行归一化或者处理缺失数据。

3. 定义逻辑函数，该函数将输入特征映射到一个概率值，通常使用Sigmoid函数。

4. 定义损失函数，通常使用最大似然估计来定义损失函数，并使用梯度下降等优化算法来最小化该损失函数。

5. 训练模型，通过迭代优化算法，找到最优的参数值，使得损失函数最小化。

6. 对新样本进行预测，使用学习到的参数和逻辑函数，将输入特征映射到一个概率值，并根据阈值将其分类为正类或负类。

逻辑回归可以用于线性可分的和线性不可分的数据集。

它在进行预测时计算复杂度较低，模型也相对简单，易于解释和理解。

然而，逻辑回归对于非线性关系的学习能力较弱，因此在处理非线性问题时可能不够准确。

第17章Logistic回归分析..

2018Biblioteka 10/5 52018/10/5
北京大学光华管理学院涂平教授
6
讨论：非诚勿扰中的男嘉宾
每次出场的男嘉宾有几种可能的下场？影响结局的主要因素可能有哪些？如何衡量这些因素对结局的影响？结局是否可以预测？
2018/10/5
7
与普通线性回归的比较
当因变量是二分变量时，普通线性回归的主要假设并不能得到满足。
2018/10/5
13
估计回归系数
Logistic 回归系数一般由极大似然法 (maximum likelihood method）估算，这通常通过迭代方法实现。
自变量的取舍通常根据极大似然比（ MLR）或协方差近似估计(ACE)，以逐步回归的方式进行。
2018/10/5
14
回归系数的含义
截距 ß 0 决定假设所有其它变量取值为 0时事件发生的概率。回归系数j 表示事件发生概率与变量j 的关系。
q = 1/[1 + exp(ß 0 + jxj)]
(6)
2018/10/5
11
二、分析步骤
定义问题估算回归系数
确定回归系数的显著性解释结果模型评估
2018/10/5 12
定义问题
对问题的定义包括：
确定因变量
• 购买、入网、离网、投诉、开户、赖账……
确定预测变量
• 即对因变量有影响的变量，主要根据文献和经验
j > 0 正相关 j < 0 负相关 j = 0 无关
2018/10/5 15
因变量不呈正态分布；根据多元方程得到的预测值不能保证在0-1之间。
这时可以采用 binary logistic 回归, 简称logistic 回归。

逻辑回归 logistic regression

Logistic Regression
Marc Deisenroth
@Imperial College London, February 8, 2019
2
Binary Classiﬁcation
x2
3
2
1
0
−1
−2
−4
−2
0
2
x1
§ Supervised learning setting with inputs xn P RD and binary targets yn P t0, 1u belonging to classes C1, C2.
@Imperial College London, February 8, 2019
7
Implicit Modeling Assumptions
§ Assume Gaussian class conditionals
ppx|Ckq “ N `x | µk, Σ˘ where the covariance matrix Σ is shared across all K classes. § For K “ 2 we get (Bishop, 2006)
7
Implicit Modeling Assumptions
§ Assume Gaussian class conditionals
ppx|Ckq “ N `x | µk, Σ˘ where the covariance matrix Σ is shared across all K classes. § For K “ 2 we get (Bishop, 2006)
§ Then
a
:“
log
ppC1|xq ppC2|xq

Maximun likelihood linear regression for speaker adaptation of

• Consider the case of a continuous density HMM system with Gaussian output distributions.
• A particular distribution s ,is characterized by a mean vector s ,
data, a transformation may still be applied (global transformation)
6
Estimation of MLLR regression matrices
• 1.Definition of auxiliary function
S
– Assume the adaptation data, O, is a series of T observations.
and a covariance matrix C s
• Given a parameterized speech frame vector o , the probability
density of that vector being generated by distribution s is bs o
j 1
T j t log2 n / 2
t 1
log C j
1/ 2
1 2
ot
j
' C j 1
ot
j
S
constant FO |
j 1
T t 1
j
t
1 2
n
log2
1 2
log
C
j
1 2
ot
j
' C j 1

logistic回归分析PPT优秀课件

（2）线性回归分析：由于因变量是分类变量，不能满足其正态性要求；有些自变量对因变量的影响并非线性。
2
logistic回归:不仅适用于病因学分析，也可用于其他方面的研究，研究某个二分类（或无序及有序多分类）目标变量与有关因素的关系。
logistic回归的分类：（1）二分类资料logistic回归：因变量为两分类变量的资料，可用
非条件logistic回归和条件logistic回归进行分析。非条件logistic回归多用于非配比病例-对照研究或队列研究资料，条件logistic回归多用于配对或配比资料。（2）多分类资料logistic回归：因变量为多项分类的资料，可用多项分类logistic回归模型或有序分类logistic回归模型进行分析。
比较
调查方向：收集回顾性资料
人数暴露
疾病
a/(a+b) c/(c+d)
a
+
b
-
病例
c
病例对照原理示意图
6
是否暴露暴露组未暴露组合计
病例 a c a+c
对照 b d b+d
合计 a+b(n1) c+d(n2) n
比数比（odds ratio、OR）：病例对照研究中表示疾病与暴露间
联系强度的指标，也称比值比。
相对危险度RR的本质是暴露组与非暴露组发病率之比或发病概率之比。但病例对照研究不能计算发病率，只能计算比值比OR值。 OR与RR的含义是相同的，也是指暴露组的疾病危险性为非暴露组的多少倍。当疾病发病率小于5%时，OR是RR的极好近似值。
OR>1,说明该因素使疾病的危险性增加，为危险因素；
OR<1,说明该因素使疾病的危险性减小，为保护因素；

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

1
Migratory Logistic Regression for Learning Concept Drift Between Two Data Sets with Application to UXO Sensing
Xuejun Liao and Lawrence Carin Department of Electrical and Computer Engineering Duke University Durham, NC 27708-0291, USA {xjlia
N
L(yi , ζ (xi )),
i=1
with
(x, y ) ∼ T (x, y )
(1)
The empirical loss is known to approach the true loss when N → ∞. A learning algorithm based on the empirical loss minimization in (1) implicitly assumes that the future test instances are also drawn from T (x, y ). It is this assumption that assures that the classiﬁer generalizes to test instances when it is trained to minimize empirical loss on training examples. This assumption, however, is often violated in practice, since training examples and test instances may correspond to different collections of measurements (likely performed at different times under different experimental conditions) and the class memberships of the measurements may also change. These issues can introduce statistical differences between the
1 Traditionally, the (probabilistic) mapping Pr(y |x) is called a concept, and Pr(x) is called a virtual concept (language describing the concept) [2]. For simplicity, usually they are collectively called a concept.
training data. Such training data typically comes from other former bombing sites that have been cleaned, and there is a signiﬁcant issue as to whether such extant labeled sensor data are relevant for a new site under test. The challenge addressed in this paper involves learning the relevance and relationship of existing labeled (training) data for analysis of a new unlabeled or partially labeled data set of interest. This type of problem has signiﬁcant practical relevance for UXO sensing, for which results are presented on measured data, as well as for the aforementioned classes of problems, for which there is uncertainty concerning the appropriateness of existing labeled data for a new set of unlabeled data of interest. To place this problem in a mathematical setting, let T (x, y ) be the probability distribution (or concept, borrowing a term from psychology 1 ) from which test instances (each including a feature vector x and the associated class label y ) are drawn. The goal in classiﬁer design is to minimize a loss function L(y, ζ (x)), which is a quantitative measure for the loss incurred by the classiﬁer when it predicts ζ (x) for x whose true label is y . The minimization is performed for N independent training examples (x, y ) drawn from T (x, y ), leading to the empirical loss minimization [3], [4] min
Abstract— To achieve good generalization in supervised learning, the training and testing examples are usually required to be drawn from the same source distribution. In this paper we propose a method to relax this requirement in the context of logistic regression. Assuming Dp and Da are two sets of examples drawn from two different distributions T and A (called concepts, borrowing a term from psychology), where Da are fully labeled and Dp partially labeled, our objective is to complete the labels of Dp . We introduce an auxiliary variable µ for each example in Da to reﬂect its mismatch with Dp . Under an appropriate constraint the µ’s are estimated as a byproduct, along with the classiﬁer. We also present an active learning approach for selecting the labeled examples in Dp . The proposed algorithm, called migratory logistic regression (MigLogit), is demonstrated successfully on simulated data as well as on real measured data of interest for unexploded ordnance (UXO) cleanup.
I. I NTRODUCTION In supervised classiﬁcation problems, the goal is to design a classiﬁer using the training examples (labeled data) such that the classiﬁer predicts the labels correctly for unlabeled test data. The accuracy of the predictions is signiﬁcantly affected by the quality of the training examples, which are assumed to contain essential information about the test instances for which predictions are desired. A common assumption utilized by learning algorithms is that the training examples and the test instances are drawn from the same source distribution. As a practical example, consider the detection of a concealed entity based on sensor data collected in a non-invasive manner. This problem is of relevance in several practical problems, for example in the medical imaging of potential tumors or other hidden anomalies. In the context of remote sensing, one is often challenged with the problem of detecting and characterizing a concealed (e.g., underground) target based on remotely collected sensor data. In an example that will be considered explicitly in this paper, consider the detection of buried unexploded ordnance (UXO) [1]. An unexploded ordnance is a bomb that did not explode upon impact with the ground, and such items pose great danger if disturbed (excavated) without care. Sensors used for detecting and characterizing UXO include magnetometers and electromagnetic induction [1]. In designing an algorithm for characterization of anomalies detected by such sensors, to determine if a given buried item is UXO or clutter, one typically requires