[Service Fabric] Stateful Reliable Collection Restore –The New Way

In my previous blog post about how to do a Service Fabric stateful service backup, you learned how to setup the ARM template, use PowerShell scripts to create, enable, suspend and delete your backup policy.

I wanted to provide a separate blog post because although the PowerShell code for performing a Restore isn’t too complex, the amount of information you need and how to find that information is critical to your success in the Restore operation.

Pre-requisites (what I tested this on)

  • Visual Studio 2017 v15.9.4
  • Microsoft Azure Service Fabric SDK – 3.3.622

Code location:


It is assumed that you have read my previous blog posting and have your cluster setup, the Voting application installed and backups already collected in your Azure storage accounts blob storage container. After all, you can’t do a restore without a backup now can you?

In the previous posting, I did an ‘application’ backup, which backed up every stateful service and partition in the application. I only had one stateful service with one partition, so technically, I could have just executed the REST API command for backing up a partition if I wanted to.

What you will see below, is that we will restore to a particular partition in the running stateful service, which should help you understand how you would find the backup of the partition to restore from. The restore operation will automatically look in the storage location specified by the backup policy, but you could also customize where the restore operation gets its data from.

Task 1 – Capturing the Partition ID

The first piece of information you are going to need is the partitionId of the partition you want to restore to.

To find your partitionId, log in to the Azure portal and go to your cluster resource. Open Service Fabric Explorer. In the explorer, you can open the treeview where the fabric:/Voting/VotingData service is and capture/copy the partitionId.


Task 2 – Capturing the RestorePartitionDescription information

Now that you have your partitionId, you need to know about the information that is stored in the backup that you want to restore from. This is called the RestorePartitionDescription object. NOTE: If you do not provide this information, you will just get the last backup that took place.

The RestorePartitionDescription information includes the BackupId, BackupLocation and BackupStore (the BackupStore item is optional). But how do you get the BackupId and BackupLocation? You can do this by getting the partition backup list. Here is an example:

GET http://localhost:19080/Partitions/1daae3f5-7fd6-42e9-b1ba-8c05f873994d/$/GetBackups?api-version=6.4&StartDateTimeFilter=2018-01-01T00:00:00Z&EndDateTimeFilter=2018-01-01T23:59:59Z

The only parameters actually required in the GET request is the partitionId and api-version. The api-version though, leave at 6.4 until later Service Fabric updates to this feature. When you run the command above, you will get back a list of backups for this partition and then from this list, you can choose which backup you want to restore. The information in this backup is the restore partition description information.

Your output from the GetBackups call should look something like this:

“ContinuationToken”: “<app-or-service-info>”,
“Items”: [
“BackupId”: “3a056ac9-7206-43c3-8424-6f6103003eba”,
“BackupChainId”: “3a056ac9-7206-43c3-8424-6f6103003eba”,
“ApplicationName”: “fabric:/<your-app-name>”,
“ServiceManifestVersion”: “1.0.0”,
“ServiceName”: “fabric:/<your-app-name>/<partition-service-name>”,
“PartitionInformation”: {
“LowKey”: “-9223372036854775808”,
“HighKey”: “9223372036854775807”,
“ServicePartitionKind”: “Int64Range”,
“Id”: “1daae3f5-7fd6-42e9-b1ba-8c05f873994d”
“BackupLocation”: “<your-app-name>\\<partition-service-name>\\<partitonId>\\<name-of-zip-file-to-restore>”,
“BackupType”: “Full”,
“EpochOfLastBackupRecord”: {
“DataLossVersion”: “131462452931584510”,
“ConfigurationVersion”: “8589934592”
“LsnOfLastBackupRecord”: “261”,
“CreationTimeUtc”: “2018-01-01T09:00:55Z”,
“FailureError”: null

You can collect the BackupId and BackupLocation information by running the ListPartitionBackups.ps1 file in the root\Assets folder.

Task 3 – Restoring your partition

Now, you will need to run the Restore command and setup the body of your JSON request object with the RestorePartitionDescription information.

1. In the root\Assets folder, open the RestorePartitionBackup.ps1 script.

2. Fill in the parameters. If you scroll down below the parameters you can see how the JSON body is being constructed with the parameters you enter. Note that for the JSON that you are constructing, it is important to have the double-quotes around each JSON attribute as you see below.



3. Save the script and then press F5 to execute the restore. What you should see in the PowerShell cmd window is something like this:


Task 4 – What happens during the restore

In my test, I first started by running at least 1 full backup operation with 4 incrementals. Each time a backup is performed, I changed the number of votes for Voter1 and Voter2. What I had was something like this:

I wanted to let the backups take place at least past the 4th incremental backup and then what I wanted to do was restore back to Incremental-2 (Voter1 = 6, Voter2 = 7).

But first, here are a couple of things to think about when doing a restore:

  • What happens when I choose to restore Incremental-2? In this case, Incremental-1, Incremental-2 and Full are restored to the reliable collection.
  • What happens to the partition when its being restored? Is it offline/unavailable? Yes, this partition is down. If you have an algorithm in your code that sends particular data to this partition, you are going to have to take this down time in to account.
  • If a backup happens to be taking place when I submit the command to do a restore, what happens? It depends upon the time of the calls, but since a restore from IT/Operations is normally a planned task, you need to let whatever backup that may be running finish before doing a restore.

On with the show…here is what my current screen looks like for the Voting app, I’m on Incremental-4:


With my partitionId available, along with the backupId and information specific to Incremental-2, I kick off the RestorePartitionBackup.ps1 script.

Here is what the stateful service looked like whenever the partition was being restored:


After a minute or so (the time will depend on the size of your data restore), this was the result, which is back to the Incremental-2 data values:



I realize this is really a small sample with a single partition being restored, but the exercise is fundamental in how a restore takes place and what happens during this process.

I hope this blog post can help you out with your own restore processes!

For more information on the REST API for the Restore command, go here https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-api-restorepartition.

[Service Fabric] Stateful Reliable Collection Backup –The New Way

In my previous article on Service Fabric Backup and Restore, it could be witnessed that the process of setting up the process via APIs was pretty tedious, plus the fact that it was almost entirely developer driven (meaning C# coding). It was also a bit confusing to try to figure out how the full and incremental backups were stored in their folder structure(s).

Recently, the Service Fabric team announced general availability of a new method of backup and restore of stateful reliable collections (requires version 6.4 of the Service Fabric bits). You may see what is termed as a ‘no code’ method of performing a backup/restore, I’ll explain what that means below.

What this blog post will provide, is a complete project sample on how to setup and perform a Backup (but not a Restore), with PowerShell, ARM template and code, to help you understand how to tie this all together. I’ll cover restore in a future post.

What do you mean by ‘No Code’?

You may see a description of the new backup/restore procedure that says it is ‘no code’. What that actually means is there there are two ways of configuring your backup/restore, one is using a C# API and here (this is what developers would use to configure/build the backup/restore process). Then there is using PowerShell scripts, which is what is called the ‘no code’ option (even though technically yes it is code). No code simply means that developers are not the ones writing code to set things up.

Most customers I have worked with DO NOT want their developers to have control over the backup/restore process since it is considered to be an HA/DR/Operations procedure. Using PowerShell, this takes the configuration/deployment out of the compiled code and back in to the IT operations realm. In fact, the developer may not even know a backup is being taken while the service is running. As I stated earlier, we’ll discuss Restore in a later post because for a Restore.

I tend to agree with the IT Operations folks on the idea that backup/restore should not be a developers focus, therefore, this is the way we’ll do it in this blog post.

Pre-requisites (what I tested this on)

  • Visual Studio 2017 v15.9.5
  • Microsoft Azure Service Fabric SDK – 3.3.622

Code location:


SFBackupV2 will be known in this post as the ‘root’ folder.

Task 1: Creating your certificate and Azure Key Vault

1. In the root\Assets folder, you will find a PowerShell script named CreateVaultCerts.ps1. Open PowerShell ISE as an administrator and open this file.

2. At the top of the script, you will see several parameters that you need to fill in depending on your subscription/naming conventions.


3. After filling in your parameters, log in to Azure using the PowerShell command prompt window in the ISE editor by using the command:

Task 2: Deploy your cluster using the ARM template

We will be building a 3 node secure cluster via an ARM template. The cluster will be secured with a single certificate and this single certificate will also be referenced by the backup configuration. We will not be using Azure Active Directory (AAD) in this sample. So what’s special about this template?

1. Using your editor of choice, open the ServiceFabricCluster.json file in the root\Assets folder. Although the template file is already setup appropriately, its important to understand some of the required settings.

In order to use the new Backup/Restore service, you need to have it enabled in the cluster. First, you need to be using the correct API version for Microsoft.ServiceFabric/clusters:
“apiVersion”: “2018-02-01“,
“type”: “Microsoft.ServiceFabric/clusters”,
“name”: “[parameters(‘clusterName’)]”,
“location”: “[parameters(‘clusterLocation’)]”,


2. Next, you need to enable the backup/restore service inside of your addonFeatures section of Microsoft.ServiceFabric/clusters:

properties”: {
“addonFeatures”: [


3. Next, add a section in the fabricSettings for your X.509 certificate for the encryption of the credentials. Here, we’ll just use the same certificate we use for the cluster to make it more simple.

“properties”: {

“addonFeatures”: [“BackupRestoreService“],
“fabricSettings”: [{
“name”: “BackupRestoreService”,
“parameters”:  [{
“name”: “SecretEncryptionCertThumbprint”,
“value”: “[Thumbprint]”


4. Now open the ServiceFabricCluster.parameters.json file located in the root\Assets folder. There are several parameters that need to be filled in. Any parameter value that is already filled in, leave that as is. You will also notice that there are parameter values needed that you should have from running Task 1 (cert thumbprint, source vault resource id etc).


NOTE: For the clusterName, you only need to provide the first part of the FQDN like <clusterName>.eastus.cloudapp.azure.com.

One particular parameter to note is the ‘osSkuName’. This is the size/class VM that will be used for the cluster. At minimum, this needs to be a Standard_D2_v2.

5. Once you have entered your parameter values, save the file and then open the root\Assets\deploy.json file in PowerShell ISE.In order to execute this script, you’ll need to know your subscriptionId, the resource group name you want your cluster to be created in and the Azure region (data center). Press F5 to execute the script. It will take approximately 20 minutes to create the cluster.

Task 3: Review your cluster services

    1. Log in to the Azure portal and go to the resource group where your Service Fabric cluster resides.
    2. Click on the name of your cluster and then in the cluster blade, click on the link to open the Service Fabric Explorer.
    3. Expand the Services item in the treeview and you should see a BackupRestoreService system service listed.


Task 4: Deploy the Voting application

1. Open Visual Studio 2017 as an administrator and then open the Voting.sln solution in the root\Voting folder.

2. Rebuild the project to assure that all the NuGet packages have been restored.

3. Right-click on the Voting project and select Publish.

4. In the Publish dialog, pick your subscription and your cluster name. Make sure you have the Cloud.xml profile and parameters file selected. Once you select your cluster name, you should see a green check once the VS publish mechanism connects to your cluster. If you see a red X instead, you can still try to publish and then look at the output to see what the actual error is. NOTE: If you see the red X, go in to the PublishProfiles\Cloud.xml file and make sure your cluster name and certificate thumbprint are listed there:

<ClusterConnectionParameters ConnectionEndpoint=”<FQDN-of-your-cluster>:19000″ X509Credential=”true” ServerCertThumbprint=”<cluster-thumbprint>” FindType=”FindByThumbprint” FindValue=”<cluster-thumbprint>” StoreLocation=”CurrentUser” StoreName=”My” />

5. Click the Publish button to publish the app to your Service Fabric cluster.

6. Log in to the Azure portal and go to your cluster resource. You should be able to see that the application has been deployed (after a few minutes) and you also want to make sure it is healthy. This can be determined by seeing a green check beside the status.


7. Prior to creating and enabling our backup profile, you need to make sure you have an Azure Storage account setup with a blob container in order to capture the data being backed up. In this example, I am going to use one of the storage accounts that the Service Fabric cluster uses. Normally, this is a bad idea for many reasons, i/o usage, space consumed etc, but you can create your own separate storage account in your subscription if you wish. I will create a new blob container named ‘blobvotebackup’.

Task 5: Create and enable your backup policy

At this point, you have your cluster created, your app deployed and in a running healthy state. It’s time to create your backup policy and enable it.

1. In PowerShell ISE, open the Backup.ps1 file in the root\Assets folder.

2. There are several parameters to fill in here, well commented. This script will create the backup policy and then enable it. Fill in your parameters


If you scroll down through the script, you’ll see the configuration information for the backup policy.

#start setting up storage info

$StorageInfo = @{
ConnectionString = $storageConnString
ContainerName = $containerName
StorageKind = ‘AzureBlobStore’


# backup schedule info, backup every 5 minutes
$ScheduleInfo = @{
Interval = ‘PT5M’
ScheduleKind = ‘FrequencyBased’


$retentionPolicy = @{
RetentionPolicyType = ‘Basic’
RetentionDuration = ‘P10D’


# backup policy parameters
# After 5 incremental backups, do a full backup
$BackupPolicy = @{
Name = $backupPolicyName
MaxIncrementalBackups = 5
Schedule = $ScheduleInfo
Storage = $StorageInfo
RetentionPolicy = $retentionPolicy


  • Note that the ‘StorageKind’ is AzureBlobStore. You could also choose an on-premises file store.
  • Also note the Interval of how often a backup is taken and that it is taken as a Frequency. This could also be a scheduled or ad-hoc backup for a certain time of day. I’m setting mine to 5 minutes just to get the sample code going.
  • There is a retention policy for the data for 10 days
  • The MaxIncrementalBackups which will tell the policy how many incremental backups to take before doing a new full backup. The backup service will always starts with a full backup on a newly enabled policy.

Since we are using a PowerShell script to create and enable the backup policy, we are using calls directly to the BackupRestore REST APIs. Notice where the Create it taking place. Notice how the URL is being built to create the policy and the API version being used.

$url = “https://” + $clusterName + “:19080/BackupRestore/BackupPolicies/$/Create?api-version=6.4″

Farther down in the script you’ll see where the url is being created for the EnableBackup command. Notice that we are specifying to backup at the ‘Application’ level, meaning if the app had more than one stateful service, they would all use the same backup policy.You can also enable backup at a partition or service level.

$url = “https://” + $clusterName + “:19080/Applications/” + $appName + “/$/EnableBackup?api-version=6.4″

3. Press F5 to execute the script and create/enable the backup policy. At this point, after 10 minutes, a backup will be created in the background.

Task 6: Confirm that data is being backed up

  1. Go back to the Azure portal, to your storage account and drill down to your backup blob container name. If you click on the backup blob container name (after waiting at least 5 minutes), you’ll see how the structure of the full/incremental backup process has taken place.


A couple of things to note:

  • You’ll see the blob container name in the upper left hand corner
  • You’ll see the ‘Location’ where you have the name of the blob, the name of the app, that name of the service in the app and then the partitionId.
  • For each backup, you have a .bkmetadata and .zip file.

2. To get a complete list of the backups, open the ListBackups.ps1 script in the root\Assets folder.

3. Fill in the parameters, and select F5 to run the script. You should see a list of all the current backups, names, IDs, partition numbers etc. This type of information will be important when you are ready to do a restore. Remember that each partition in a stateful service will have its own backup. You can also find a ListPartitionBackups.ps1 script in the root\Assets folder, just add your partitionID to the script parameters.

Below is a snapshot of the type of information you would see from running ListPartitionBackups.ps1:


Task 7 – Disable and Delete your backup policy

Now that you’ve had all the fun of seeing your services reliable collections being backed up in your blob container, you have a few choices. You can:

  • Suspend – this essentially just stops backups from being taken for a period of time. Suspension is thought of as a temporary action.
  • Resume – resuming a suspended backup
  • Enable – enabling a backup policy
  • Disable – use this when there is no longer a need to back up data from the reliable collection.
  • Delete – deletes the entire backup policy but your data still exists

One example of using a mix of the settings above is where you could enable a backup for an entire application but suspend or disable a backup for a particular service or partition in that application.

  1. The script RemoveBackup.ps1 from the root\Assets will do all 3 of the above. Depending on what you want to do at this point, set breakpoints within the PowerShell script to first suspend, then disable the backup policy. You will notice that there will be no more backups taking place.
  2. Once you are finished with your tests, continue the script to delete the backup policy.


For further information on backup and restore, see https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-azurecluster

Client library usage https://github.com/Microsoft/service-fabric-client-dotnet/blob/develop/docs/ClientLibraryUsage.md