[Service Fabric] Stateful Reliable Collection Restore –The New Way

In my previous blog post about how to do a Service Fabric stateful service backup, you learned how to setup the ARM template, use PowerShell scripts to create, enable, suspend and delete your backup policy.

I wanted to provide a separate blog post because although the PowerShell code for performing a Restore isn’t too complex, the amount of information you need and how to find that information is critical to your success in the Restore operation.

Pre-requisites (what I tested this on)

  • Visual Studio 2017 v15.9.4
  • Microsoft Azure Service Fabric SDK – 3.3.622

Code location:

https://github.com/larrywa/blogpostings/tree/master/SFBackupV2

It is assumed that you have read my previous blog posting and have your cluster setup, the Voting application installed and backups already collected in your Azure storage accounts blob storage container. After all, you can’t do a restore without a backup now can you?

In the previous posting, I did an ‘application’ backup, which backed up every stateful service and partition in the application. I only had one stateful service with one partition, so technically, I could have just executed the REST API command for backing up a partition if I wanted to.

What you will see below, is that we will restore to a particular partition in the running stateful service, which should help you understand how you would find the backup of the partition to restore from. The restore operation will automatically look in the storage location specified by the backup policy, but you could also customize where the restore operation gets its data from.

Task 1 – Capturing the Partition ID

The first piece of information you are going to need is the partitionId of the partition you want to restore to.

To find your partitionId, log in to the Azure portal and go to your cluster resource. Open Service Fabric Explorer. In the explorer, you can open the treeview where the fabric:/Voting/VotingData service is and capture/copy the partitionId.

image

Task 2 – Capturing the RestorePartitionDescription information

Now that you have your partitionId, you need to know about the information that is stored in the backup that you want to restore from. This is called the RestorePartitionDescription object. NOTE: If you do not provide this information, you will just get the last backup that took place.

The RestorePartitionDescription information includes the BackupId, BackupLocation and BackupStore (the BackupStore item is optional). But how do you get the BackupId and BackupLocation? You can do this by getting the partition backup list. Here is an example:

GET http://localhost:19080/Partitions/1daae3f5-7fd6-42e9-b1ba-8c05f873994d/$/GetBackups?api-version=6.4&StartDateTimeFilter=2018-01-01T00:00:00Z&EndDateTimeFilter=2018-01-01T23:59:59Z

The only parameters actually required in the GET request is the partitionId and api-version. The api-version though, leave at 6.4 until later Service Fabric updates to this feature. When you run the command above, you will get back a list of backups for this partition and then from this list, you can choose which backup you want to restore. The information in this backup is the restore partition description information.

Your output from the GetBackups call should look something like this:

{
“ContinuationToken”: “<app-or-service-info>”,
“Items”: [
{
“BackupId”: “3a056ac9-7206-43c3-8424-6f6103003eba”,
“BackupChainId”: “3a056ac9-7206-43c3-8424-6f6103003eba”,
“ApplicationName”: “fabric:/<your-app-name>”,
“ServiceManifestVersion”: “1.0.0”,
“ServiceName”: “fabric:/<your-app-name>/<partition-service-name>”,
“PartitionInformation”: {
“LowKey”: “-9223372036854775808”,
“HighKey”: “9223372036854775807”,
“ServicePartitionKind”: “Int64Range”,
“Id”: “1daae3f5-7fd6-42e9-b1ba-8c05f873994d”
},
“BackupLocation”: “<your-app-name>\\<partition-service-name>\\<partitonId>\\<name-of-zip-file-to-restore>”,
“BackupType”: “Full”,
“EpochOfLastBackupRecord”: {
“DataLossVersion”: “131462452931584510”,
“ConfigurationVersion”: “8589934592”
},
“LsnOfLastBackupRecord”: “261”,
“CreationTimeUtc”: “2018-01-01T09:00:55Z”,
“FailureError”: null
}

You can collect the BackupId and BackupLocation information by running the ListPartitionBackups.ps1 file in the root\Assets folder.

Task 3 – Restoring your partition

Now, you will need to run the Restore command and setup the body of your JSON request object with the RestorePartitionDescription information.

1. In the root\Assets folder, open the RestorePartitionBackup.ps1 script.

2. Fill in the parameters. If you scroll down below the parameters you can see how the JSON body is being constructed with the parameters you enter. Note that for the JSON that you are constructing, it is important to have the double-quotes around each JSON attribute as you see below.

image

image

3. Save the script and then press F5 to execute the restore. What you should see in the PowerShell cmd window is something like this:

image

Task 4 – What happens during the restore

In my test, I first started by running at least 1 full backup operation with 4 incrementals. Each time a backup is performed, I changed the number of votes for Voter1 and Voter2. What I had was something like this:

image
I wanted to let the backups take place at least past the 4th incremental backup and then what I wanted to do was restore back to Incremental-2 (Voter1 = 6, Voter2 = 7).

But first, here are a couple of things to think about when doing a restore:

  • What happens when I choose to restore Incremental-2? In this case, Incremental-1, Incremental-2 and Full are restored to the reliable collection.
  • What happens to the partition when its being restored? Is it offline/unavailable? Yes, this partition is down. If you have an algorithm in your code that sends particular data to this partition, you are going to have to take this down time in to account.
  • If a backup happens to be taking place when I submit the command to do a restore, what happens? It depends upon the time of the calls, but since a restore from IT/Operations is normally a planned task, you need to let whatever backup that may be running finish before doing a restore.

On with the show…here is what my current screen looks like for the Voting app, I’m on Incremental-4:

image

With my partitionId available, along with the backupId and information specific to Incremental-2, I kick off the RestorePartitionBackup.ps1 script.

Here is what the stateful service looked like whenever the partition was being restored:

image

After a minute or so (the time will depend on the size of your data restore), this was the result, which is back to the Incremental-2 data values:

image

Summary

I realize this is really a small sample with a single partition being restored, but the exercise is fundamental in how a restore takes place and what happens during this process.

I hope this blog post can help you out with your own restore processes!

For more information on the REST API for the Restore command, go here https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-api-restorepartition.

Leave a Reply