Ibm storwize v3700 ошибка 734 - TopOshibok.ru - решение и исправление самых разных ошибок

Posted by SMU1311 2017-07-09T08:48:42Z

Hi Guys,

After a power loss, the V3700 reports «error 578» on one of the two nodes.

To resolve the error, it suggests a fix «Remove system data» on the affected node.

I have some VM’s running on that storage.

My question is, will I be losing any data on the storage? and can I run the fix, while the VM’s are running and the storage is being used?

8 Replies

Do you have a support contract in place with IBM? If not, it might be worth getting approval for a one off support payment to help out with this.

I’d also suggest investing in some UPS devices in future. SAN’s do not like having power removed.

SMU1311 wrote:

My question is, will I be losing any data on the storage? and can I run the fix, while the VM’s are running and the storage is being used?

You shouldn’t do, no. You’ll be running on a single node at the moment as one of the nodes has an issue and needs to be reinitialised. However, you should also consider what would have happened if this had occurred on both nodes or what’ll happen if you have another power outage now.

Finally, as ever, make sure you’ve got good, tested backups.

Was this post helpful?
thumb_up
thumb_down
We do not have a support contract.

The device is connected to UPS, due to absence, we were unable to verify that the device was shutdown properly.

Was this post helpful?
thumb_up
thumb_down
See this link before proceeding…

https://www.ibm.com/support/knowledgecenter/en/STLM5A_7.1.0/com.ibm.storwize.v3700.710.doc/svc_t3int… Opens a new window

Also, I second Gary’s advice on backups, UPS and contacting IBM and getting support.

You should also have UPS alerting turned on as well as think about network monitoring for these occasions…

IBM support is pricey but well worth it…

Was this post helpful?
thumb_up
thumb_down
SMU1311 wrote:

We do not have a support contract.

Get one. These are the times it’s required, that and various firmware updates.

Was this post helpful?
thumb_up
thumb_down
All I’m asking for is, if it’s safe to run the fix, it suggests, without any data loss to the VM’s.

This is the fix:

https://www.ibm.com/support/knowledgecenter/en/STLM5A_7.2.0/com.ibm.storwize.v3700.720.doc/svc_t3fix… Opens a new window

Was this post helpful?
thumb_up
thumb_down
it should be safe to run the fix. As I said above, make sure you have good, tested backups.

Was this post helpful?
thumb_up
thumb_down
If anyone else has the same question:

Data will NOT be lost, just make sure to follow the instructions, to the point.

Was this post helpful?
thumb_up
thumb_down

Removing system information for node canisters with error

code 550 or error code 578 using the service assistant

Performing system recovery using the service assistant

Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide

v Any data that was in the cache at the point of failure is lost. The loss of data can

result in data corruption on the affected volumes. If the volumes are corrupted,

call the IBM Support Center.

Before you can run a system recovery procedure, it is important that the root cause

of the hardware issues be identified and fixed.

Obtain a basic understanding about the hardware failure. In most situations when

there is no clustered system, a power issue is the cause. For example, both power

supplies might have been removed.

The system recovery procedure works only when all node canisters are in

candidate status. If there are any node canisters that display error code 550 or error

code 578, you must remove their data.

Before performing this task, ensure that you have read the introductory

information in the overall recover system procedure.

To remove system information from a node canister with an error 550 or 578,

follow this procedure using the service assistant:

1. Point your browser to the service IP address of one of the nodes, for example,

https://node_service_ip_address/service/.

If you do not know the IP address or if it has not been configured, you must

assign an IP address using the initialization tool.

2. Log on to the service assistant.

3. Select Manage System.

4. Click Remove System Data.

5. Confirm that you want to remove the system data when prompted.

6. Remove the system data for the other nodes that display a 550 or a 578 error.

All nodes previously in this system must have a node status of Candidate and

have no errors listed against them.

7. Resolve any hardware errors until the error condition for all nodes in the

system is None.

8. Ensure that all nodes in the system display a status of candidate.

When all nodes display a status of candidate and all error conditions are None,

you can run the recovery procedure.

Start recovery when all node canisters that were members of the system are online

and have candidate status. If there are any nodes that display error code 550 or

error code 578, you must remove their system data to get them into candidate

status. Do not run the recovery procedure on different node canisters in the same

system. This restriction includes remote systems also.

All node canisters must be at the same level of software that the storage system

had before the system failure. If any node canisters were modified or replaced, use

the service assistant to verify the levels of software, and where necessary, to

upgrade or downgrade the level of software.

Contents
Table of Contents
Bookmarks

Quick Links

IBM Storwize V3700 for Lenovo

Product Guide

IBM Storwize V3700 Storage System for Lenovo (Machine Type 6099) is a member of the IBM Storwize

family of disk systems. By using IBM Storwize V7000 Storage System and IBM SAN Volume Controller

functions, interoperability, and management tools, Storwize V3700 delivers innovation and new levels of

storage efficiency with ease of use in an entry-level disk system to enable organizations to overcome their

storage challenges.

Storwize V3700 Storage System features two node canisters, with 4 GB cache per canister upgradeable to

8 GB, in a compact, 2U, 19-inch rack mount enclosure. A 6 Gb SAS and 1 Gb iSCSI connectivity is

standard, with an option for 8 Gb Fibre Channel (FC) or 10 iSCSI or Fibre Channel over Ethernet (FCoE)

connectivity.

Storwize V3700 supports up to 240 drives with up to nine external expansion units. It also offers flexible

drive configurations with the choice of 2.5-inch and 3.5-inch drive form factors, 10 K or 15 K rpm SAS and

7.2 K rpm NL SAS hard disk drives (HDDs), and SAS solid-state drives (SSDs).

The Storwize V3700 SFF enclosure is shown in the following figure.

Figure 1. Storwize V3700 SFF enclosure

Did you know?

Storwize V3700 provides small and midsized organizations with the ability to consolidate and share data

at an affordable price, while utilizing advanced software capabilities that often are found in more expensive

systems.

Storwize V3700 can be scaled up to 960 TB of raw storage capacity.

Storwize V3700 offers hybrid block storage connectivity with support for 6 Gb SAS, 1 Gb iSCSI, and 10 Gb

iSCSI or FCoE or 8 Gb FC at the same time.

Click here to check for updates

IBM Storwize V3700 for Lenovo

Summary of Contents for Lenovo IBM Storwize V3700

This manual is also suitable for:

6099

Источник

I have received this question several times, so it’s clearly something people are interested in.

The Storwize V7000 has two controllers known as node canisters. It’s an active/active storage controller, in that both node canisters are processing I/O at any time and any volume can be happily accessed via either node canister.

The question then gets asked: what happens if a node canister fails and can I test this? The answer to the question of failure is that the second node canister will handle all the I/O on its own. Your host multipathing driver will switch to the remaining paths and life will go on. We know this works because doing a firmware upgrade takes one node canister offline at a time, so if you have already done a firmware update, then you have already tested node canister fail over. But what if you want to test this discretely? There are four ways:

Walk up to the machine and physically pull out a node canister. This is a bit extreme and is NOT recommended.
Power off a node canister using the CLI (using the satask stopnode command). This will work for the purposes of testing node failure, but the only way to power on the node canister is to pull it out and reinsert it. This is again a bit extreme and is not recommended. This is also different to an SVC, since each SVC has it’s own power on/off button.
Use the CLI to remove one node from the I/O group (using the svctask rmnode command). This works on an SVC because the nodes are physically separate. On a Storwize V7000 the nodes live in the same enclosure and a candidate node will immediately be added back to the cluster, so as a test this is not that helpful.
Place one node into service state and leave it there will you check all your hosts. This is my recommended method.

First up this test assumes there is NOTHING else wrong with your Storwize V7000. We are not testing multiple failure here. You need to confirm the Recommend Actions panel as shown below, contains no items. If there are errors listed, fix them first.

Once we are certain our Storwize V7000 is clean and ready for test, we need to connect via the Service Assistant Web GUI. If you have not set up access to the service assistant, please read this blog post first.

So what’s the process?

Firstly logon to the service assistant on node 1 and place node 2 into service state. I chose node 2 because normally node 1 is the configuration node (the node that owns the cluster IP address). You need to confirm your connected to node 1 (check at top right) and select node 2 (from the Change Node menu) and then choose to Enter Service State from the drop down and hit GO.

You will get this message confirming your placing node 2 into service state. If it looks correct, select OK.

The GUI will pause on this screen for a short period. Wait for the OK button to un-grey.

You will eventually get to this with Node 1 Active and Node 2 in Service.

Node 2 is now offline. Go and confirm that everything is working as desired on your hosts (half your paths will be offline but your hosts should still be able to access the Storwize V7000 via the other node canister).

When your host checking is complete, you can use the same drop down to Exit Service State on node2 and select GO.

You will get a pop up window to confirm your selection. If the window looks correct, select OK.

You will get the following panel. You will need to wait for the OK button to become available (to un-grey).

Provided both nodes now show as Active, your test is now complete.

About Anthony Vandewerdt

I am an IT Professional who lives and works in Melbourne Australia.
This blog is totally my own work.
It does not represent the views of any corporation.
Constructive and useful comments are very very welcome.

Источник

Posted by SMU1311 2017-07-09T08:48:42Z

Hi Guys,

After a power loss, the V3700 reports «error 578» on one of the two nodes.

To resolve the error, it suggests a fix «Remove system data» on the affected node.

I have some VM’s running on that storage.

My question is, will I be losing any data on the storage? and can I run the fix, while the VM’s are running and the storage is being used?

8 Replies

Do you have a support contract in place with IBM? If not, it might be worth getting approval for a one off support payment to help out with this.

I’d also suggest investing in some UPS devices in future. SAN’s do not like having power removed.

SMU1311 wrote:

My question is, will I be losing any data on the storage? and can I run the fix, while the VM’s are running and the storage is being used?

You shouldn’t do, no. You’ll be running on a single node at the moment as one of the nodes has an issue and needs to be reinitialised. However, you should also consider what would have happened if this had occurred on both nodes or what’ll happen if you have another power outage now.

Finally, as ever, make sure you’ve got good, tested backups.

Was this post helpful?
thumb_up
thumb_down
We do not have a support contract.

The device is connected to UPS, due to absence, we were unable to verify that the device was shutdown properly.

Was this post helpful?
thumb_up
thumb_down
See this link before proceeding…

https://www.ibm.com/support/knowledgecenter/en/STLM5A_7.1.0/com.ibm.storwize.v3700.710.doc/svc_t3int… Opens a new window

Also, I second Gary’s advice on backups, UPS and contacting IBM and getting support.

You should also have UPS alerting turned on as well as think about network monitoring for these occasions…

IBM support is pricey but well worth it…

Was this post helpful?
thumb_up
thumb_down
SMU1311 wrote:

We do not have a support contract.

Get one. These are the times it’s required, that and various firmware updates.

Was this post helpful?
thumb_up
thumb_down
All I’m asking for is, if it’s safe to run the fix, it suggests, without any data loss to the VM’s.

This is the fix:

https://www.ibm.com/support/knowledgecenter/en/STLM5A_7.2.0/com.ibm.storwize.v3700.720.doc/svc_t3fix… Opens a new window

Was this post helpful?
thumb_up
thumb_down
it should be safe to run the fix. As I said above, make sure you have good, tested backups.

Was this post helpful?
thumb_up
thumb_down
If anyone else has the same question:

Data will NOT be lost, just make sure to follow the instructions, to the point.

Was this post helpful?
thumb_up
thumb_down

Источник

О LENOVO

О LENOVO

Наша компания
Новости
Контакт
Соответствие продукта
Работа в Lenovo
Общедоступное программное обеспечение Lenovo

КУПИТЬ

КУПИТЬ

Где купить
Рекомендованные магазины
Стать партнером

Поддержка

Поддержка

Драйверы и Программное обеспечение
Инструкция
Инструкция
Поиск гарантии
Свяжитесь с нами
Поддержка хранилища

РЕСУРСЫ

РЕСУРСЫ

Тренинги
Спецификации продуктов ((PSREF)
Доступность продукта
Информация об окружающей среде

Lenovo.

|
|
|
|

Источник

8 Replies

Performing system recovery using the service assistant

Quick Links

Summary of Contents for Lenovo IBM Storwize V3700

This manual is also suitable for:

About Anthony Vandewerdt

8 Replies

О LENOVO

КУПИТЬ

Поддержка

РЕСУРСЫ

Интересное по теме: