I'm applying TL5 to my Aix 6.1 and i'm migrating from gpfs 3.2 to 3.4 (but i have gpfs problem)
i have all the disks in gpfs configuration with 4 path.
For example in hdisk3: when i run dlnkmgr view -lu -hdev hdisk3
0752 hdisk3
00001 000010 Online
00002 000011 Online
00000 000020 Online
00003 000021 Online
But when i run mmstartup -a in one node of the cluster i get this
0752 hdisk3
00001 000010 Offline(E)
00002 000011 Offline(E)
00000 000020 Online
00003 000021 Offline(E)
The log of gpfs in /var/adm/ras/mmfs.log.latest is ok but it's take much time
Loading kernel extension from /usr/lpp/mmfs/bin . . .
GPFS: 6027-506 /usr/lpp/mmfs/bin/mmfskxload: /usr/lpp/mmfs/bin/aix64/mmfs64 is already loaded at 1353801728.
Tue Jun 28 15:14:36.262 2011: GPFS: 6027-310 mmfsd64 initializing. {Version: 3.4.0.1 Built: Jul 26 2010 21:05:47} ...
Tue Jun 28 15:19:50.301 2011: GPFS: 6027-2723 This node (172.16.2.152 (pm-db1-labo-priv)) is now Cluster Manager for pm-db-labo.pm-db1-labo-priv.
Tue Jun 28 15:19:50.588 2011: GPFS: 6027-300 mmfsd ready
Tue Jun 28 15:19:50 GMT-03:00 2011: mmcommon mmfsup invoked. Parameters: 172.16.2.152 172.16.2.152 all
Tue Jun 28 15:19:51 GMT-03:00 2011: mounting /dev/gpfs1
Tue Jun 28 15:19:51.285 2011: Command: mount gpfs1 7405632
Tue Jun 28 15:19:51.599 2011: GPFS: 6027-630 Node 172.16.2.152 (pm-db1-labo-priv) appointed as manager for gpfs1.
Tue Jun 28 15:27:06.695 2011: GPFS: 6027-643 Node 172.16.2.152 (pm-db1-labo-priv) completed take over for gpfs1.
Tue Jun 28 15:27:06.725 2011: Command: err 0: mount gpfs1 7405632
Tue Jun 28 15:27:06 GMT-03:00 2011: finished mounting /dev/gpfs1
Tue Jun 28 15:27:06 GMT-03:00 2011: mounting /dev/gpfs2
Tue Jun 28 15:27:07.049 2011: Command: mount gpfs2 6750456
Tue Jun 28 15:27:07.384 2011: GPFS: 6027-630 Node 172.16.2.152 (pm-db1-labo-priv) appointed as manager for gpfs2.
......
If you see this log hdisk3 is filesystem /gpfs2
Solution
I found the solution to my gpfs problem.
1) Find which HBA active path goes through
dlnkmgr view -hba
2) lsdev | grep fcs
3) Once you get the problem fcs make this
rmdev -Rdl fcs2
4) Delete all the disk that you are in use with gpfs
rmdev -Rdl hdisk2
rmdev -Rdl hdisk3
rmdev -Rdl hdisk4
5) Clear all logs
errclear 0
6) Run cfgmgr
7) Check if errpt don't report any problem
8) Run lspath and dlnkmgr view -lu
9) Startup gpfs in one node
mmstartup -a
First run mmshutdown -a in one node
10) Run again
errpt
dlnkmgr view -hba
lspath
11) Reboot all nodes in the cluster to check everything is OK
That's all
i have all the disks in gpfs configuration with 4 path.
For example in hdisk3: when i run dlnkmgr view -lu -hdev hdisk3
0752 hdisk3
00001 000010 Online
00002 000011 Online
00000 000020 Online
00003 000021 Online
But when i run mmstartup -a in one node of the cluster i get this
0752 hdisk3
00001 000010 Offline(E)
00002 000011 Offline(E)
00000 000020 Online
00003 000021 Offline(E)
The log of gpfs in /var/adm/ras/mmfs.log.latest is ok but it's take much time
Loading kernel extension from /usr/lpp/mmfs/bin . . .
GPFS: 6027-506 /usr/lpp/mmfs/bin/mmfskxload: /usr/lpp/mmfs/bin/aix64/mmfs64 is already loaded at 1353801728.
Tue Jun 28 15:14:36.262 2011: GPFS: 6027-310 mmfsd64 initializing. {Version: 3.4.0.1 Built: Jul 26 2010 21:05:47} ...
Tue Jun 28 15:19:50.301 2011: GPFS: 6027-2723 This node (172.16.2.152 (pm-db1-labo-priv)) is now Cluster Manager for pm-db-labo.pm-db1-labo-priv.
Tue Jun 28 15:19:50.588 2011: GPFS: 6027-300 mmfsd ready
Tue Jun 28 15:19:50 GMT-03:00 2011: mmcommon mmfsup invoked. Parameters: 172.16.2.152 172.16.2.152 all
Tue Jun 28 15:19:51 GMT-03:00 2011: mounting /dev/gpfs1
Tue Jun 28 15:19:51.285 2011: Command: mount gpfs1 7405632
Tue Jun 28 15:19:51.599 2011: GPFS: 6027-630 Node 172.16.2.152 (pm-db1-labo-priv) appointed as manager for gpfs1.
Tue Jun 28 15:27:06.695 2011: GPFS: 6027-643 Node 172.16.2.152 (pm-db1-labo-priv) completed take over for gpfs1.
Tue Jun 28 15:27:06.725 2011: Command: err 0: mount gpfs1 7405632
Tue Jun 28 15:27:06 GMT-03:00 2011: finished mounting /dev/gpfs1
Tue Jun 28 15:27:06 GMT-03:00 2011: mounting /dev/gpfs2
Tue Jun 28 15:27:07.049 2011: Command: mount gpfs2 6750456
Tue Jun 28 15:27:07.384 2011: GPFS: 6027-630 Node 172.16.2.152 (pm-db1-labo-priv) appointed as manager for gpfs2.
......
If you see this log hdisk3 is filesystem /gpfs2
Solution
I found the solution to my gpfs problem.
1) Find which HBA active path goes through
dlnkmgr view -hba
2) lsdev | grep fcs
3) Once you get the problem fcs make this
rmdev -Rdl fcs2
4) Delete all the disk that you are in use with gpfs
rmdev -Rdl hdisk2
rmdev -Rdl hdisk3
rmdev -Rdl hdisk4
5) Clear all logs
errclear 0
6) Run cfgmgr
7) Check if errpt don't report any problem
8) Run lspath and dlnkmgr view -lu
9) Startup gpfs in one node
mmstartup -a
First run mmshutdown -a in one node
10) Run again
errpt
dlnkmgr view -hba
lspath
11) Reboot all nodes in the cluster to check everything is OK
That's all
Comentarios
Publicar un comentario