[Service Fabric] Stateful Reliable Collection Backup –The New Way

In my previous article on Service Fabric Backup and Restore, it could be witnessed that the process of setting up the process via APIs was pretty tedious, plus the fact that it was almost entirely developer driven (meaning C# coding). It was also a bit confusing to try to figure out how the full and incremental backups were stored in their folder structure(s).

Recently, the Service Fabric team announced general availability of a new method of backup and restore of stateful reliable collections (requires version 6.4 of the Service Fabric bits). You may see what is termed as a ‘no code’ method of performing a backup/restore, I’ll explain what that means below.

What this blog post will provide, is a complete project sample on how to setup and perform a Backup (but not a Restore), with PowerShell, ARM template and code, to help you understand how to tie this all together. I’ll cover restore in a future post.

What do you mean by ‘No Code’?

You may see a description of the new backup/restore procedure that says it is ‘no code’. What that actually means is there there are two ways of configuring your backup/restore, one is using a C# API and here (this is what developers would use to configure/build the backup/restore process). Then there is using PowerShell scripts, which is what is called the ‘no code’ option (even though technically yes it is code). No code simply means that developers are not the ones writing code to set things up.

Most customers I have worked with DO NOT want their developers to have control over the backup/restore process since it is considered to be an HA/DR/Operations procedure. Using PowerShell, this takes the configuration/deployment out of the compiled code and back in to the IT operations realm. In fact, the developer may not even know a backup is being taken while the service is running. As I stated earlier, we’ll discuss Restore in a later post because for a Restore.

I tend to agree with the IT Operations folks on the idea that backup/restore should not be a developers focus, therefore, this is the way we’ll do it in this blog post.

Pre-requisites (what I tested this on)

  • Visual Studio 2017 v15.9.5
  • Microsoft Azure Service Fabric SDK – 3.3.622

Code location:

https://github.com/larrywa/blogpostings/tree/master/SFBackupV2

SFBackupV2 will be known in this post as the ‘root’ folder.

Task 1: Creating your certificate and Azure Key Vault

1. In the root\Assets folder, you will find a PowerShell script named CreateVaultCerts.ps1. Open PowerShell ISE as an administrator and open this file.

2. At the top of the script, you will see several parameters that you need to fill in depending on your subscription/naming conventions.

image

3. After filling in your parameters, log in to Azure using the PowerShell command prompt window in the ISE editor by using the command:
Login-AzureRmAccount

Task 2: Deploy your cluster using the ARM template

We will be building a 3 node secure cluster via an ARM template. The cluster will be secured with a single certificate and this single certificate will also be referenced by the backup configuration. We will not be using Azure Active Directory (AAD) in this sample. So what’s special about this template?

1. Using your editor of choice, open the ServiceFabricCluster.json file in the root\Assets folder. Although the template file is already setup appropriately, its important to understand some of the required settings.

In order to use the new Backup/Restore service, you need to have it enabled in the cluster. First, you need to be using the correct API version for Microsoft.ServiceFabric/clusters:
{
“apiVersion”: “2018-02-01“,
“type”: “Microsoft.ServiceFabric/clusters”,
“name”: “[parameters(‘clusterName’)]”,
“location”: “[parameters(‘clusterLocation’)]”,

}

2. Next, you need to enable the backup/restore service inside of your addonFeatures section of Microsoft.ServiceFabric/clusters:

properties”: {
“addonFeatures”: [

BackupRestoreService
],

3. Next, add a section in the fabricSettings for your X.509 certificate for the encryption of the credentials. Here, we’ll just use the same certificate we use for the cluster to make it more simple.

“properties”: {

“addonFeatures”: [“BackupRestoreService“],
“fabricSettings”: [{
“name”: “BackupRestoreService”,
“parameters”:  [{
“name”: “SecretEncryptionCertThumbprint”,
“value”: “[Thumbprint]”
}
]
}

}

4. Now open the ServiceFabricCluster.parameters.json file located in the root\Assets folder. There are several parameters that need to be filled in. Any parameter value that is already filled in, leave that as is. You will also notice that there are parameter values needed that you should have from running Task 1 (cert thumbprint, source vault resource id etc).

image

NOTE: For the clusterName, you only need to provide the first part of the FQDN like <clusterName>.eastus.cloudapp.azure.com.

One particular parameter to note is the ‘osSkuName’. This is the size/class VM that will be used for the cluster. At minimum, this needs to be a Standard_D2_v2.

5. Once you have entered your parameter values, save the file and then open the root\Assets\deploy.json file in PowerShell ISE.In order to execute this script, you’ll need to know your subscriptionId, the resource group name you want your cluster to be created in and the Azure region (data center). Press F5 to execute the script. It will take approximately 20 minutes to create the cluster.

Task 3: Review your cluster services

    1. Log in to the Azure portal and go to the resource group where your Service Fabric cluster resides.
    2. Click on the name of your cluster and then in the cluster blade, click on the link to open the Service Fabric Explorer.
    3. Expand the Services item in the treeview and you should see a BackupRestoreService system service listed.

image

Task 4: Deploy the Voting application

1. Open Visual Studio 2017 as an administrator and then open the Voting.sln solution in the root\Voting folder.

2. Rebuild the project to assure that all the NuGet packages have been restored.

3. Right-click on the Voting project and select Publish.

4. In the Publish dialog, pick your subscription and your cluster name. Make sure you have the Cloud.xml profile and parameters file selected. Once you select your cluster name, you should see a green check once the VS publish mechanism connects to your cluster. If you see a red X instead, you can still try to publish and then look at the output to see what the actual error is. NOTE: If you see the red X, go in to the PublishProfiles\Cloud.xml file and make sure your cluster name and certificate thumbprint are listed there:

<ClusterConnectionParameters ConnectionEndpoint=”<FQDN-of-your-cluster>:19000″ X509Credential=”true” ServerCertThumbprint=”<cluster-thumbprint>” FindType=”FindByThumbprint” FindValue=”<cluster-thumbprint>” StoreLocation=”CurrentUser” StoreName=”My” />

5. Click the Publish button to publish the app to your Service Fabric cluster.

6. Log in to the Azure portal and go to your cluster resource. You should be able to see that the application has been deployed (after a few minutes) and you also want to make sure it is healthy. This can be determined by seeing a green check beside the status.

image

7. Prior to creating and enabling our backup profile, you need to make sure you have an Azure Storage account setup with a blob container in order to capture the data being backed up. In this example, I am going to use one of the storage accounts that the Service Fabric cluster uses. Normally, this is a bad idea for many reasons, i/o usage, space consumed etc, but you can create your own separate storage account in your subscription if you wish. I will create a new blob container named ‘blobvotebackup’.

Task 5: Create and enable your backup policy

At this point, you have your cluster created, your app deployed and in a running healthy state. It’s time to create your backup policy and enable it.

1. In PowerShell ISE, open the Backup.ps1 file in the root\Assets folder.

2. There are several parameters to fill in here, well commented. This script will create the backup policy and then enable it. Fill in your parameters

image

If you scroll down through the script, you’ll see the configuration information for the backup policy.

#start setting up storage info

$StorageInfo = @{
ConnectionString = $storageConnString
ContainerName = $containerName
StorageKind = ‘AzureBlobStore’
}

 

# backup schedule info, backup every 5 minutes
$ScheduleInfo = @{
Interval = ‘PT5M’
ScheduleKind = ‘FrequencyBased’
}

 

$retentionPolicy = @{
RetentionPolicyType = ‘Basic’
RetentionDuration = ‘P10D’
}

 

# backup policy parameters
# After 5 incremental backups, do a full backup
$BackupPolicy = @{
Name = $backupPolicyName
MaxIncrementalBackups = 5
Schedule = $ScheduleInfo
Storage = $StorageInfo
RetentionPolicy = $retentionPolicy

}

  • Note that the ‘StorageKind’ is AzureBlobStore. You could also choose an on-premises file store.
  • Also note the Interval of how often a backup is taken and that it is taken as a Frequency. This could also be a scheduled or ad-hoc backup for a certain time of day. I’m setting mine to 5 minutes just to get the sample code going.
  • There is a retention policy for the data for 10 days
  • The MaxIncrementalBackups which will tell the policy how many incremental backups to take before doing a new full backup. The backup service will always starts with a full backup on a newly enabled policy.

Since we are using a PowerShell script to create and enable the backup policy, we are using calls directly to the BackupRestore REST APIs. Notice where the Create it taking place. Notice how the URL is being built to create the policy and the API version being used.

$url = “https://” + $clusterName + “:19080/BackupRestore/BackupPolicies/$/Create?api-version=6.4″

Farther down in the script you’ll see where the url is being created for the EnableBackup command. Notice that we are specifying to backup at the ‘Application’ level, meaning if the app had more than one stateful service, they would all use the same backup policy.You can also enable backup at a partition or service level.

$url = “https://” + $clusterName + “:19080/Applications/” + $appName + “/$/EnableBackup?api-version=6.4″

3. Press F5 to execute the script and create/enable the backup policy. At this point, after 10 minutes, a backup will be created in the background.

Task 6: Confirm that data is being backed up

  1. Go back to the Azure portal, to your storage account and drill down to your backup blob container name. If you click on the backup blob container name (after waiting at least 5 minutes), you’ll see how the structure of the full/incremental backup process has taken place.

image

A couple of things to note:

  • You’ll see the blob container name in the upper left hand corner
  • You’ll see the ‘Location’ where you have the name of the blob, the name of the app, that name of the service in the app and then the partitionId.
  • For each backup, you have a .bkmetadata and .zip file.

2. To get a complete list of the backups, open the ListBackups.ps1 script in the root\Assets folder.

3. Fill in the parameters, and select F5 to run the script. You should see a list of all the current backups, names, IDs, partition numbers etc. This type of information will be important when you are ready to do a restore. Remember that each partition in a stateful service will have its own backup. You can also find a ListPartitionBackups.ps1 script in the root\Assets folder, just add your partitionID to the script parameters.

Below is a snapshot of the type of information you would see from running ListPartitionBackups.ps1:

image

Task 7 – Disable and Delete your backup policy

Now that you’ve had all the fun of seeing your services reliable collections being backed up in your blob container, you have a few choices. You can:

  • Suspend – this essentially just stops backups from being taken for a period of time. Suspension is thought of as a temporary action.
  • Resume – resuming a suspended backup
  • Enable – enabling a backup policy
  • Disable – use this when there is no longer a need to back up data from the reliable collection.
  • Delete – deletes the entire backup policy but your data still exists

One example of using a mix of the settings above is where you could enable a backup for an entire application but suspend or disable a backup for a particular service or partition in that application.

  1. The script RemoveBackup.ps1 from the root\Assets will do all 3 of the above. Depending on what you want to do at this point, set breakpoints within the PowerShell script to first suspend, then disable the backup policy. You will notice that there will be no more backups taking place.
  2. Once you are finished with your tests, continue the script to delete the backup policy.

References

For further information on backup and restore, see https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-azurecluster

Client library usage https://github.com/Microsoft/service-fabric-client-dotnet/blob/develop/docs/ClientLibraryUsage.md

Leave a Reply