aix更换系统镜像盘

删除原有镜像操作 :

# cfgmgr
# lsdev -Cc disk
hdisk0 Available 11-09-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 11-09-00-10,0 16 Bit LVD SCSI Disk Drive
# lsvg
rootvg
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
hdisk0 missing 542 148 70..00..00..00..78
# unmirrorvg rootvg hdisk0
0516-1246 rmlvcopy: If hd5 is the boot logical volume, please run 'chpv -c '
as root user to clear the boot record and avoid a potential boot
off an old boot image that may reside on the disk from which this
logical volume is moved/removed.
0516-1132 unmirrorvg: Quorum requirement turned on, reboot system for this
to take effect for rootvg.
0516-1144 unmirrorvg: rootvg successfully unmirrored, user should perform
bosboot of system to reinitialize boot records. Then, user must modify
bootlist to just include: hdisk1.
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
hdisk0 missing 542 542 109..108..108..108..109
# chpv -c hdisk0
# reducevg rootvg hdisk0
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
# bootlist -m normal hdisk1



重建镜像 :


lsdev -Cc disk
hdisk0 Available 11-09-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 11-09-00-10,0 16 Bit LVD SCSI Disk Drive
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
# chdev -l hdisk0 -a pv=yes
hdisk0 changed
# extendvg rootvg hdisk0
0516-1398 extendvg: The physical volume hdisk0, appears to belong to
another volume group. Use the force option to add this physical volume
to a volume group.
0516-792 extendvg: Unable to extend volume group.
# extendvg -f rootvg hdisk0
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
hdisk0 active 542 542 109..108..108..108..109
# mirrorvg rootvg
0516-1124 mirrorvg: Quorum requirement turned off, reboot system for this
to take effect for rootvg.
0516-1126 mirrorvg: rootvg successfully mirrored, user should perform
bosboot of system to initialize boot records. Then, user must modify
bootlist to include: hdisk0 hdisk1.
# bosboot -a -d /dev/hdisk0

bosboot: Boot image is 20904 512 byte blocks.
# bootlist -m normal hdisk1 hdisk0
# lsvg

-p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
hdisk0 active 542 148 76..12..00..00..60
# exit



现场诊断及修复过程 :



1,登陆到F85主机,仔细查看错误日志,确认系统镜像盘hdisk0的故障:

# errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
2F3E09A4 0801151907 I H hdisk0 REPAIR ACTION
16F35C72 0801083807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0801003807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0731163807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0731083807 P H hdisk0 DISK OPERATION ERROR
B6048838 0731051907 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0731043107 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
B6048838 0731041907 P S SYSPROC SOFTWARE PROGRAM ABNORMALLY TERMINATED
16F35C72 0731003807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0730163807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0730083807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0730003807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0729163807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0729083807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0729003807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0728163807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0728083807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0728003807 P H hdisk0 DISK OPERATION ERROR
16F35C72 0727163807 P H hdisk0 DISK OPERATION ERROR
2,打开机器前面板,根据硬盘使用繁忙程度(比较两块硬盘指示灯闪烁情况)初步判断系统镜像盘位于前面板右下侧硬盘笼子中的一号插槽,三号插槽中为hdisk1;
3,为进一步确保判断的准确性,请客户停掉主机上的应用、关闭数据库、关闭系统。待所有操作完成,打开硬盘笼子挡板,手工拔去一号插槽中的硬盘,重新启动到系统,“lsdev –Cc disk”观察hdisk0和hdisk1的状态,hdisk0由上次的“available”变为“defined”,hdisk1没有变化,则说明位于硬盘笼一号插槽中确为hdisk0;
4,根据服务手册,硬盘笼中硬盘可热插拔。在主机运行的情况下将拔出的硬盘插入一号插槽,合上硬盘笼挡板;
5,通过命令“cfgmgr”重新识别到hdisk0,“lsdev –Cc disk”查看hdisk0状态转为“available”,但在“lsvg –p rootvg”中查看物理盘hdisk0的状态为“missing”而非“active”,表示仍然有问题;
6,对hdisk0进行删除镜像操作:
# unmirrorvg rootvg hdisk0(取消hdisk0对rootvg的镜像)
# chpv -c hdisk0(清除引导区)
# reducevg rootvg hdisk0(将hdisk0从卷组中清除)
# bootlist -m normal hdisk1(重设启动顺序)
7,“lsvg –p rootvg”观察,

确认hdisk0已经脱离rootvg,打开硬盘笼挡板,拔出一号槽中的hdisk0,并将新带来的36G硬盘插入到一号槽中,合上硬盘笼挡板。“cfgmgr”重新扫描硬件设备,“lsdev –Cc disk”查看新加硬盘在系统中显示为“hdisk0”,且状态为“available”,表示此硬盘可用;
8,对hdisk0进行镜像操作:
# chdev -l hdisk0 -a pv=yes(将新硬盘设为可用)
# extendvg rootvg hdisk0(将hdisk0加入rootvg)
# mirrorvg rootvg(对rootvg进行镜像,约二十分钟后镜像成功)
# bosboot -a -d /dev/hdisk0(在hdisk0上创建启动映象)
# bootlist -m normal hdisk1 hdisk0(重新设置启动顺序)
9,通过命令“lsvg –p rootvg”查看确认hdisk0已经包含在rootvg中并处于“active”状态,且使用pp数相同(一个pp等于64M):
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk1 active 542 148 70..00..00..00..78
hdisk0 active 542 148 76..12..00..00..60





找准它的位置(F85硬盘更换经验总结) :


上周出差到外地,为的是给客户更换一块F85的镜像盘。因为之前没有拆开客户那儿F85的前面板,无法获知硬盘的具体位置(两块硬盘,hdisk0和hdisk1,其中hdisk0是hdisk1的系统镜像盘)。根据用户手册,F85机器上总共有十四个插槽可插入硬盘,如图所示:



5与6的说明是:Two-Position SCSI Disk Drive Bay: bay D14(top),Bay D13(bottom).Bays for the installation of two SCSI disk drives.要注意的是这里插入的硬盘是非热插拔的。也就是说,如果要更换插在这两处的硬盘,必须关机下电再行更换。
7与8的指示是:Disk Drive Bay: Bank DB2(top),Bay DB1(bottom)(SES or SSA). Bays for the installation of SCSI or SSA disk drives or RAID arrays.
9与10的说明是:Disk Drive: Bay D07(top left), Bay D12(top right). Bay D01(bottom left), Bay D06(bottom right). Disk drives in a SCSI or SSA disk drive bay.
简要说明一下,这里的7与8表示两个独立的硬盘笼子,凡是插在其中的硬盘都是可热插拔的,9与10分别对应硬盘插槽。从9指示的位置开始向右数,分别表示D07-D12,从10指示的位置开始向右数,分别表示D01-D06。
下面说说我的前期准备,在出差之前,我并不知道系统盘与镜像盘在前面板的什么位置,就根据之前巡检的信息来判断:
# lsdev -Cc disk
hdisk0 Available 11-09-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 11-09-00-10,0 16 Bit LVD SCSI Disk Drive
# lscfg
INSTALLED RESOURCE LIST

The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
* = Diagnostic support not available.

Model Architecture: chrp
Model Implementation: Multiple Processor, PCI bus

+ hdisk0 P1/Z2-A8 16 Bit LVD SCSI Disk Drive (36400 MB)


+ hdisk1 P1/Z2-Aa 16 Bit LVD SCSI Disk Drive (36400 MB)
从这两条命令,可以看出hdisk0与hdisk1的路径分别是11-09-00-8,0/11-09-00-10,0(lscfg看到的Aa即A10)。在这里,我犯了一个错误,认为这里的数字“8”和“10”就是跟两个硬盘笼子上面的插槽位置一一对应,推断下来,A8对应D8,Aa对应D10。可到达客户现场揭开硬盘笼子的时候才发现实际不是这么回事,两块硬盘分别插在D01与D03的位置,有些茫然不知所措,因为跟之前的估计相差太远,即使两块硬盘都处在可热插拔的盘笼中,但因为无法一一定位,所以只能采取关机下电,拔出一块硬盘,然后再开机,看系统从哪块硬盘启动。这种方法虽然愚蠢但在只有两块硬盘存在的情况下也不失为比较保险,但如果硬盘数量一多,大于等于二,那么这种方法就无法体现出时效性,且很多情况下,客户的机器是不能随便重启乃至关机的。
仔细查阅服务手册(Service Guard),可以看到有关AIX location codes的详细定义,如下图所示:



要找准每块硬盘对应的插槽,AIX Location Codes中的“G,H”位是关键,有了这两个位的信息,就像命令“lsdev –Cc disk”看到的那样,hdisk0为“8,0”,hdisk1为“12,0”,再根据服务手册:


就能清楚的看到,路径代码为11-09-00-8,0的hdisk0对应的slot name为D01,而路径代码为11-09-00-10,0的hdisk1对应的slot name为D03,那么就能准确的判断出F85前面板右下角硬盘笼从左到右第一个硬盘插槽中正是hdisk0。找准了它的位置,那么剩下来的工作就好做多了。

相关文档
最新文档