时间序列回归模型R实现

时间序列回归模型

1干预分析

1.1概念及模型

Box和Tiao引入的干预分析提供了对于干预影响时间序列的效果进行评估的一个框架，假设干预是可以通过时间序列的均值函数或者趋势而对过程施加影响，干预可以自然产生也可以人为施加的，如国家的宏观调控等。

其模型可以如下表示：

其中mt代表均值的变化，Nt是ARIMA过程。

1.2干预的分类

阶梯响应干预

脉冲响应干预

1.3干预的实例分析

1.3.1模型初探

对数化航空客运里程的干预模型的估计

> data(airmiles)

> acf(as.vector(diff(diff(window(log(airmiles),end=c(2001,8)),12))),lag.max=48)#用window得到在911事件以前的未爱干预的时间序列子集

对暂用的模型进行诊断

>fitmode<-arima(airmiles,order=c(0,1,1),seasonal=list(order=c(0,1,0)))

> tsdiag(fitmode)

从诊断图可以看出存在三个异常点，acf在12阶存在高度相关因此在季节中加入MA（1）系数。

1.3.2拟合带有干预信息的模型

函数：

arimax(x, order = c(0, 0, 0), seasonal = list(order = c(0, 0, 0), period = NA), xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL,

init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond, optim.control = list(),

kappa = 1e+06, io = NULL, xtransf, transfer = NULL)

arimax函数扩展了arima函数，可以处理时间序列中干扰分析及异常值。假设干扰影响过程的均值，相对未受干扰的无价值函数的偏离用一些协变量的ARMA滤波器的输出这种来表示，偏差被称作传递函数。构造传递函数的协变量通过xtransf参数以矩阵或者data.frame的形式代入arimax函数。

air.m1=arimax(log(airmiles),order=c(0,1,1),seasonal=list(order=c(0,1,1),

period=12),xtransf=data.frame(I911=1*(seq(airmiles)==69),

I911=1*(seq(airmiles)==69)),

transfer=list(c(0,0),c(1,0)),xreg=data.frame(Dec96=1*(seq(airmiles)==12),

Jan97=1*(seq(airmiles)==13),Dec02=1*(seq(airmiles)==84)),method='ML')

> air.m1

Call:

arimax(x = log(airmiles), order = c(0, 1, 1), seasonal = list(order = c(0, 1,

1), period = 12), xreg = data.frame(Dec96 = 1 * (seq(airmiles) == 12), Jan97 = 1

(seq(airmiles) == 13), Dec02 = 1 * (seq(airmiles) == 84)), method = "ML",

xtransf = data.frame(I911 = 1 * (seq(airmiles) == 69), I911 = 1 * (seq(airmiles) ==

69)), transfer = list(c(0, 0), c(1, 0)))

Coefficients:

ma1 sma1 Dec96 Jan97 Dec02 I911-MA0 I911.1-AR1 I911.1-MA0

-0.3825 -0.6499 0.0989 -0.0690 0.0810 -0.0949 0.8139 -0.2715

s.e. 0.0926 0.1189 0.0228 0.0218 0.0202 0.0462 0.0978 0.0439

sigma^2 estimated as 0.0006721: log likelihood = 219.99, aic = -423.98

画图

plot(log(airmiles),ylab="log(airmiles)")

points(fitted(air.m1))

Nine11p=1*(seq(airmiles)==69)

plot(ts(Nine11p*(-0.0949)+

filter(Nine11p,filter=.8139,method='recursive',side=1)*(-0.2715),

frequency=12,start=1996),type='h',ylab='9/11 Effects')

abline(h=0)

从上图可以看出在2003年底后，911事件的影响效应才平息，航班客运量恢复了正常。

2异常值

在时间序列中异常有两种，可加异常和新息异常，分别记AO和IO。

2.1异常值示例

2.1.1模拟数据

模拟一般的ARIMA（1，0，1），然后故意将第10个观测值变成异常值10.

> set.seed(12345)

> y=arima.sim(model=list(ar=0.8,ma=0.5),n.start=158,n=100)

> y

Time Series:

Start = 1

End = 100

Frequency = 1

[1] 0.49180881 -0.22323665 -0.99151270 -0.73387818 -0.67750094 -1.14472133 -2.14844671 -2.49530794

[9] -1.50355358 -2.12615253 -0.55651713 0.41326344 0.51869129 1.86210605 2.19935472 2.60210165

[17] 0.79130003 0.26265426 2.93414857 3.99045889

3.60822678 1.17845765 -0.87682948 -1.20637799

[25] -1.39501221 -0.18832171 1.22999827 1.46814850 2.66647491 3.23417469 2.60349624 1.49513215

[33] 1.48852142 0.95739219 1.30011654 1.73444053

2.84825103

3.73214655

4.23579456 3.37049790

[41] 2.02783955 1.41218929 -0.29974176 -1.58712591 -1.34080878 0.10747609 1.44651081 1.67809487

[49] -0.34663129 -0.50291459 0.01739605 -0.01426474 0.94217204 0.39046221 -0.39883530 1.60638918

[57] 1.70668201 1.37518194 1.91824534 0.14254056

-2.88169481 -3.30372327 -1.74068408 -3.24868057

[65] -3.89415683 -3.45920240 -1.11042078 0.67959744 0.67051084 0.44394061 1.89536060 2.36063873

[73] 2.00559443 0.86443324 0.46847572 0.72338498

1.60215098 1.25922277 1.53180859 0.96289779

[81] 1.07712188 1.42386354 0.56318008 -0.46689543 -0.91861106 -1.92947085 -2.18188785 -1.02759087

[89] 2.31088272 3.13847319 3.01237881 3.43454807

2.31539494 2.44909873 2.91589141 1.12648908

[97] -0.08123871 0.44412579 0.26116418 -0.45815484

> y[10]<-10

2.1.2模型初步判断

> acf(y)

> pacf(y)

> eacf(y)

AR/MA

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 x x o o o o o o o o o o o o

1 o o o o o o o o o o o o o o

2 o o o o o o o o o o o o o o

3 o x o o o o o o o o o o o o

4 o x o o o o o o o o o o o o

5 x x o o o o o o o o o o o o

6 x o o o o o o o o o o o o o

7 o x o o o o o o o o o o o o

从三个的结果来看，可以初步分析y是AR（1）模型2.1.3对模型时行拟合

> m1=arima(y,order=c(1,0,0))

> m1

Call:

arima(x = y, order = c(1, 0, 0))

Coefficients:

ar1 intercept

0.5419 0.7096

s.e. 0.0831 0.3603

2.1.4对模拟模型进行异常值探测> detectAO(m1)

[,1] [,2] [,3]

ind 9.000000 10.000000 11.000000 lambda2 -4.018412 9.068982 -4.247367

> detectAO(m1,robust=F)

[,1]

ind 10.000000

lambda2 7.321709

> detectIO(m1)

[,1] [,2]

ind 10.000000 11.00000

lambda1 7.782013 -4.67421

AO探测结果认为第9，10，11.可能出现异常值。IO探测认为第10，11可能出现了异常值。由于检验统计量的最大取值出现在10且AO〉IO，所以更认为出现异常值在第10是AO异常

2.1.5考虑异常值的时间序列拟合

> m2=arima(y,order=c(1,0,0),xreg=data.frame(AO=seq(y)==10))

> m2

Call:

arima(x = y, order = c(1, 0, 0), xreg = data.frame(AO = seq(y) == 10))

Coefficients:

ar1 intercept AO

0.8072 0.5698 10.9940

s.e. 0.0570 0.5129 0.8012

sigma^2 estimated as 1.059: log likelihood = -145.29, aic = 296.58 > detectAO(m2)

[1] "No AO detected"

> detectIO(m2)

[1] "No IO detected"

2.1.6比较有无异常值的两模型

再次进行异常值探测时，没有发现异常值，验证最初序列异常出现在10的猜测对比模型1和2的拟合效果

> tsdiag(m2)

> tsdiag(m1)

虽然模型二的残差通过引入异常值后正太性是显性的，但是其acf和P值结果显示引入MA （1）是必要的。

2.1.7重新拟合适当模型

> m3=arima(y,order=c(1,0,1),xreg=data.frame(AO=seq(y)==10))

> detectAO(m3)

[1] "No AO detected"

> detectIO(m3)

[1] "No IO detected"

> tsdiag(m3)

> m3

Call:

arima(x = y, order = c(1, 0, 1), xreg = data.frame(AO = seq(y) == 10))

Coefficients:

ar1 ma1 intercept AO

0.6596 0.6154 0.5850 11.1781

s.e. 0.0799 0.0796 0.4132 0.4755

sigma^2 estimated as 0.793: log likelihood = -131.16, aic = 270.33

模型的拟合效果是显著提高。Acf和P 值检验也一步通过。

> plot(y,type='b')

> arrows(40,7,11,9.8,length=0.8,angle=30)

2.2另一个现实例子

数据包中的co2

m1.co2=arima(co2,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=1 2))

> m1.co2

Call:

arima(x = co2, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12))

Coefficients:

ma1 sma1

-0.5792 -0.8206

s.e. 0.0791 0.1137

sigma^2 estimated as 0.5446: log likelihood = -139.54, aic = 283.08 > detectAO(m1.co2)

[1] "No AO detected"

> detectIO(m1.co2)

[,1]

ind 57.000000

lambda1 3.752715

拟合含有新息异常的模型

m4.co2=arimax(co2,order=c(0,1,1),seasonal=list(order=c(0,1,1),period= 12),io=c(57))

> m4.co2

Call:

arimax(x = co2, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), period = 12),

io = c(57))

Coefficients:

ma1 sma1 IO-57

-0.5925 -0.8274 2.6770

s.e. 0.0775 0.1016 0.7246

sigma^2 estimated as 0.4869: log likelihood = -133.08, aic = 272.16 模型显示AIC相比之前模型一更小了。而且IO效应的P 值=2.677/0.7246是显著的.

3伪相关

在时间序列中引入协变量，如非洲牧草产量通常与某些气候指标密切相关，在这种发问下在通过在时间序列模型中纳入相关的协变量，将有助于更好的了解基础过程以及得到更为准确的预测。

3.1模拟数据

set.seed(12345)

X=rnorm(105)

Y=zlag(X,2)+.5*rnorm(105)

X=ts(X[-(1:5)],start=1,freq=1)

Y=ts(Y[-(1:5)],start=1,freq=1)

ccf(X,Y,ylab='CCF')

从ccf中可以看出两样本在滞后2期存在明显的相关性。

3.2奶产量与对数化发电量的伪相关

data(milk)

data(electricity)

milk.electricity=ts.intersect(milk,log(electricity))# intersect函数将多个时间序列合并在一个容器中。

ccf(as.numeric(milk.electricity[,1]),as.numeric(milk.electricity[,2]),

main='milk & electricity',ylab='CCF')

两者相关性似乎非常的强，但实际上这是因为他们的各自存在很强的自相关性。

4预白化与随机回归

对于具有强自相关的数据而言，很难评估两个过程之前是否存在依赖关系，因而，宜将x 和y之间的线性关系关联从其各自相关关系中剥离出来。预白化正是为了达到此目的的一个有效工具。

4.1牛奶与电量的CCF预白化校正

> data(milk)

> me.dif=ts.intersect(diff(diff(milk,12)),diff(diff(log(electricity),12)))

> prewhiten(as.vector(me.dif[,1]),as.vector(me.dif[,2]),ylab='CCf')

再次分析两者的相关性，此时除了时滞-3具有边缘显著外，其他地方没有一个相关系数是显著的。幌动防震这给出的35个样本互相关系娄中大约会出现1.75=35x0.05个虚假警报，即这个-3系数的显著可能就是一个虚假的信息。因此，牛奶与耗电量序列实际上是基本不相关的。从而认为之前在原始数据序列中发现的强互相关是伪相关的。

4.2Log（销售量）与价格数据的相关性分析

4.2.1预白化处理

plot(bluebird,yax.flip=T)#画两者的时间序列对比图

预白化处理