Sync a Managed Metadata Term Group with SharePoint Online or On-Premises
syncmms

Sync a Managed Metadata Term Group with SharePoint Online or On-Premises

Posted by on Saturday, June 14th, 2014  

 

The Scripts

>Download the updated scripts .zip file

>Download the original scripts .zip file

>Download the Webinar

MMS_PPT

(open in PowerPoint and play the slideshow to hear the presentation)

The Problem

In a multi-farm environment, it is likely that data will be transferred between farms, either as one-off copies or migrations of entire sites. Microsoft’s implementation of the Managed Metadata Service (MMS) and its taxonomy/term store poses a significant challenge in a multi-farm situation. When a term is stored, it is given a unique, automatically generated identifier (a GUID) which identifies the term. This term ID is used throughout the farm in lieu of the actual term itself, enabling the localization and renaming of the term. It is not possible, however, to set this ID through the user interface. The term “Department” might have an ID of “123-45-678” (yes, I know this isn’t a true GUID) in the on-premises term set – but then have an ID of “ABC-DE-F12” in a SharePoint Online term set. When an item tagged with “Department” is moved between environments, the term IDs no longer match, the term is orphaned, and the data is effectively lost.

 

What Do We Need to Solve the Problem?

In order to solve this problem, it is necessary to have a process which synchronizes the term sets between farms (or instances of Managed Metadata Services) while maintaining the original term ID. This is not necessarily a large challenge when synchronizing between local MMS instances, since you have the entirety of the server object model at your disposal (including all of the PowerShell cmdlets). In SharePoint Online, however, options are limited since Microsoft does not permit the use of custom solutions. Unfortunately, the REST API is not an option (at least at the time I’m writing this) because it has not yet been fully developed for managing Managed Metadata Services and the taxonomy. Thankfully, though, in SharePoint 2013 Microsoft has greatly enhanced the ability to manage the SharePoint Online term store remotely via the Client-Server Object Model (CSOM). We will use CSOM to sync the term store in SharePoint Online with the one in our local farm. Since we can use CSOM with on-premises SharePoint as well, we can also use this same solution to synchronize the term store across multiple on-premises farms or instances of MMS. For example, we can synchronize a term group with an external-facing farm in the DMZ, development environments, federated service farms, etc.

An Aside: A Major Update to the Original Blog

Recently, in light of my webinar on this topic, I decided to come back to the code and rework it to resolve various problems in the first iteration of the code/blog. If you grabbed the code, you will want to download the latest version. I have kept both on this blog in case you would like to compare the two versions for some lessons learned. The text of the blog below has been updated to reflect the newest version of the solution. The current version has incorporated the following changes:

  1. Added the ability to sync from any SharePoint 2013 or SharePoint Online source instead of only the local SharePoint farm. Along with this, included the ability to specify the credentials (including the use of the Secure Store Service) to connect with the source.
  2. Majorly refactored the Managed Metadata Service helper script to provide greater modularity and re-usability. Also separated the PowerShell job code into its own helper script. Tweaked several functions in other helper scripts.
  3. Re-engineered the CSOM calls to batch more commands and decrease calls to SharePoint. For example, included the ability to return a complete object and only initialize an object if it hasn’t already happened. This resulted in a runtime savings of 22% – 78%. In a development environment with complicated term sets with many properties, the following run times were observed. There was one very large, very deep term set which skewed the results, so I ran the sync with and without that term set. I also ran the tests with 2 and 4 threads and with and without the actual creation of the term group from scratch (new sync) vs. an update of the term group (delta sync). You will notice substantial improvements due to this different approach to CSOM. It had the greatest effect when processing the large term set vs. multiple smaller ones. Times can improve with more powerful hardware.
    Including very large term store Excluding very large term store
    New Sync Delta Sync Full Sync Delta Sync
    2 Threads 4 Threads 2 Threads 4 Threads 2 Threads 4 Threads 2 Threads 4 Threads
    Original 109 104 44 41 18.75 12.5 6 4.25
    New 78 42.5 16.5 8.75 13.75 9.75 4.5 3.25
    Savings 28.4% 59.1% 62.5% 78.7% 26.7% 22.0% 25.0% 23.5%
    * All times are in minutes
  4. The term group no longer has to already exist at the destination; the script will create it if not. The term group ID and other properties will be synchronized as well.
  5. Improved the handling of reused terms. Previously, if a reused term was found at the source but the term wasn’t found at the destination, the term would be created at the destination regardless of whether or not it was the originating term. In other words, if the source term was originally created in Term Set X but then reused in Term Set Y, then the destination term could be created in Term Set Y and then re-used in Term Set X. This was done due to the multi-threaded nature of the script (you never know which term set will sync first). Now, the definition of the terms at the source will be better respected. If a re-used term is encountered but the originating term hasn’t been created at the destination yet, the term will be skipped and a message will be displayed. Rerunning the sync should then correctly re-use the term, since the originating term should be created in its proper location by that point.

How is the Solution Implemented?

There are a lot of ways we can work with the remote term store via CSOM. For example, we can build a C# application or use JavaScript in the browser or in Node.js. I opted to use PowerShell. I’m a DevOps guy by nature. Having been in IT support/operations for 8+ years, I understand the challenges that come along with maintaining a system and keeping it operational. But I’ve also done enough light development to understand the great power and flexibility you get with a custom-built solution. I see PowerShell as a really excellent tool that satisfies the needs of both Operations and Development. In PowerShell, I can use cmdlets alongside the .Net object model to make just about anything I need but with the agility that comes from a scripting language. It may not be as elegant or as neatly packaged as a full C# application or service, but I can deploy, troubleshoot, and improve with much greater ease and speed. I see PowerShell as a way to give me the best of both worlds. Don’t get me wrong – there are negatives to PowerShell as well. You lose a lot when you move out of Visual Studio and into a script editor. ALM is a certainly a challenge, for example. PowerShell certainly isn’t the right choice for every situation, but it’s my favorite tool, so I decided to see if I could make it work.

Features

The following are some of the features of the solution:

  1. A given term group in a source term store will be synchronized with a remote term group via a push process. The terms and term sets in the source term group will be sent to the remote term group, where any differences are overwritten. Any customizations at the destination will be lost after the sync is run.
  2. Credentials to connect to SharePoint can be stored securely in the Secure Store Service (so, no storing the login/password in plain text). Although the technique is used in this solution, please see my blog Get More Out of Your Secure Store Service via PowerShell and C# for more about using the Secure Store Service to store credentials.
  3. The source and/or destination term group may be in SharePoint Online or in another MMS service application (be it in the same farm or external)
  4. Any means to run and schedule the script may be used. This could mean manually running the script, scheduling it via Windows Task Manager, or running it via an enterprise automation solution. It’s just a PowerShell script, after all.
  5. To decrease the runtime (since CSOM isn’t the fastest solution), individual term sets are synchronized in parallel via multi-threading. The number of threads running at a given time is fully configurable in the script.
  6. All properties of the term group, its term sets, and its child terms will be synchronized
  7. The wrapper/helper function SyncTermGroups.ps1, if used, helps with managing the logs generated by the process via purging old logs based on their age

Requirements

The following are the requirements of the script as it is currently written:

  1. If storing credentials in a Secure Store Service target application, then the synchronization script must be run from a SharePoint server in the farm containing the SSS instance
  2. If running the script through a Windows Scheduled Task, I recommend that, in order to gain redundancy, the task be created on two different servers with the task scheduled on alternative days
  3. The credentials of an Office 365 global administrator or a term store administrator are required. They can be stored in the Secure Store Service in a Target Application on the local farm, or they can an explicit login/password can be passed to the script.

Some Comments About the Solution and Lessons Learned

  1. This was my first attempt at using CSOM (let alone CSOM through PowerShell). As a result, it’s very much likely (quite probable, actual) that it is done inefficiently. I absolutely welcome any and all feedback so that my skills (and the readers’) can be improved.
  2. The solution has been broken out into multiple .ps1 files. The *Helper.ps1 scripts are a way to collect miscellaneous functions related to a service application in order to reuse them with other scripts. SyncTermGroups.ps1 is there to make life easier and provide logging and log management.
  3. SyncTermGroups.bat makes it easy to run the SyncTermGroups.ps1 script, especially if a scheduler is used. It simply uses PowerShell to call the script with the parameters to be used. To sync an additional term group, simply create a new .bat file and set the parameters appropriately or just add another PowerShell call to the existing file. The nice thing about having these parameters in the .bat file instead of, say, the parameters of a Windows Scheduled Task is that they can be changed without modifying the task itself (which may require hauling a password out of the vault). Make sure the objects are completely secure, though, to protect against unauthorized and/or malicious changes.
  4. Although not easy, it is technically possible to run the scripts from a non-SharePoint server. To do so, the DLLs referenced in the scripts need to be copied to the machine and the scripts updated to reflect their locations. The biggest challenge is with the SharePoint Management Shell. Normally, it is not possible to run the SharePoint cmdlets on a machine without SharePoint installed, but the second option described in this discussion can work.
  5. I found that the CSOM with SharePoint Online can be a bit unstable or flakey. Commands would fail but then work after a retry. Therefore, I wrapped all CSOM calls in a function which implements retries (the number of retries is set via the numRetries parameter).
  6. CSOM with SharePoint Online can be quite slow. Therefore, I did what I could to minimize unnecessary calls. For example, if a term already exists, I compare each property to identify if any changes are necessary. The term will only be updated if a discrepancy is found.
  7. Additionally, due to the slowness, I implemented multi-tasking via PowerShell jobs. A job is kicked off per term set in the term group, and the number of threads running at a time is determined by a parameter passed to the Sync-SPTermGroup function. Although it is possible to set this number as high as you want, it is important to understand the impact to the servers of doing so. When a job completes, the Verbose, Error, Output, and Warning logs (if they exist) are exported to a log file.
  8. The script initially used recursion to crawl through the hierarchy of a term set. However, I found that I was hitting the recursion limit in PowerShell when syncing very large term groups. So, even though it’s a pain and not at all elegant, I loop through all seven possible levels in order to make sure the entire hierarchy is synced.
  9. The scripts should be fully commented, so please make sure you review the scripts if you have questions about what it’s doing and how you could potentially reuse functions.

Deployment

The following describes what you need to do to deploy the solution as I created it. For simplicity and redundancy, I created a Windows Scheduled Task on two SharePoint application servers, each running SyncTermGroups.bat on alternate days. That way if one server were to die, the sync would continue. The sync can be run as often as you like, but keep in mind that a sync can take a while to run, especially if it’s a very large term group.

To deploy the solution, do the following:

  1. Identify the service account which will be running the sync scripts and the SharePoint Online account which will act as the connection account. If DirSync has been setup then both can be the same account (recommended for simplicity.)
  2. Copy the following to a folder on one or more SharePoint Servers (i.e. D:\TermSetSync): CSOMHelper.ps1, ManagedMetadataHelper.ps1, PSJobHelper.ps1, SecureStoreServiceHelper.ps1, SyncTermGroups.ps1, SyncTermGroups.bat (optional). Alternatively, copy the scripts to a shared location (like a NAS) and point the scheduler (if used) to the single set of files.
    1. If using SyncTermGroups.bat, make sure to update the parameters to meet your needs
  3. Configure the destination security. Do the following for SharePoint Online:
    1. In the SharePoint Online Admin Center, grant a valid SharePoint Online account Term Store Administrator permissions on the root-level node in the term store
    2. At a minimum, ensure that the connection account has at least Visitor permissions in the destination site (such as the root of the SharePoint Online site).
  4. Configure On-Premises Security
    1. In Central Administration, grant the sync account Term Store Administrator permissions on the root-level node in the Term Store
    2. In the PowerShell Management Shell, execute the following command to allow the sync account access to the term store via PowerShell:
      Add-SPShellAdmin -UserName "domain\SyncAccount" -database (Get-SPServiceApplication | Where-Object {$_.TypeName -like "*metadata*"}).Database
    3. On each App server where the job will run, grant the sync account “Log on as a batch job” permissions. Do so via the Local Security Policy or by executing the following command from the Windows Resource Kit:
      ntrights +r SeBatchLogonRight -u "domain\SyncAccount"
  5. If using the option to store the connection login/password in the Secure Store Service, then create a new Target Application in the on-premises Secure Store Service application. Make sure to update the .bat file (if used) with the name you chose for the Target Application. The most important thing is to make sure the field definitions match the table below and that the service account running the sync is added to the Members.
    Name Type Field Name Field Type Target Application Administrators Members
    SharePoint Online Term Set Sync Group UserName User Name Farm account and/or farm administrators domain\SyncAccount (the service account)
    Password Password
    1. Select the Target Application and set its credentials to those of the connection account (the account to which has Term Store Administrator permissions)

Wrap-Up

If, like me, you find yourself with the need to synchronize your MMS term groups with SharePoint Online, then I hope you’ll find this solution to be useful. Unfortunately, the only way to do so and maintain the term IDs is to do the sync programmatically. Because we’re dealing with a remote SharePoint, we must use client-side techniques to update the remote terms with the local ones. CSOM is our only means to do so until Microsoft has built out the REST API for the taxonomy. Although this might not be the best example of CSOM out there, it does the job and hopefully will provide a foundation for others to build on. Please share with us any improvements you have made.

Thanks for reading!

 

Disclaimer
The sample scripts are not supported under any Summit 7 Systems standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Summit 7 Systems further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Summit 7 Systems, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Summit 7 Systems has been advised of the possibility of such damages.

 

Posted by on Saturday, June 14th, 2014  

Subscribe to RSS Feed

Sign Up for Newsletter

4

4

comments

    Jul 02
    2014

    Tyler Huntley

    Great post. One issue though – the download link for the zip file containing the scripts appears to be broken – it says Download does not exist.

    Reply
      Jul 02
      2014

      Brian Laws

      Hi, Tyler! Thanks for the good word! And thank you for letting us know the link was broken. It should be fixed now.

      Reply
        Jul 02
        2014

        Tyler Huntley

        Indeed it is – thanks again.

        Reply

Leave a Reply