So What’s Actually In My Search Index? (Part 2)
spps

So What’s Actually In My Search Index? (Part 2)

Posted by on Monday, March 10th, 2014  

 

Part 2 of 2

In this blog, the second of a two-part series, I hope to provide you with the tools to temporarily make available all of the properties found when crawling items in SharePoint 2013. In Part 1, I introduced a function to export to CSV all of the data in all of the Managed Properties for a given search. The goal was to maximize the value of your Index by discovering all that it has to offer.

At the end of the previous blog, I mentioned that there’s potentially much more out there that you can take advantage of. The crawler sees all of the various properties defined on the objects it comes across, and it creates Crawled Properties out of them. But these aren’t made available to the user through the Index until they are promoted to Managed Properties. Only a subset (roughly 20% out of the box) of Crawled Properties are made into (technically, mapped to) Managed Properties. But how do you know what else is out there? What are you missing out on? Perhaps there’s some really useful, juicy morsel which can make a refiner that’ll blow your users’ socks off.

This blog will give you two functions which you can use to temporarily create a Managed Property for each and every Crawled Property in your Search Service Application.

Crawled and Managed Properties

First, let me quickly review the key elements of the Search index: Crawled Properties and Managed Properties.

Crawled Properties

When the search Crawler component crawls through the content sources in the Search Service Application, it sees the various properties which are available on the objects. This could be metadata like the create date/time, path, creator, the item’s name, description, URL, author, MS Office document properties, etc. When it does so, the Crawler automatically records these properties in the database as Crawled Properties. Only very basic information about these properties are stored, and no data is written to the Index.

Managed Properties

By itself, a Crawled Property isn’t terribly useful to SharePoint. The data isn’t in the Index, so SharePoint can’t do anything with it yet. That’s where Managed Properties come in. A Managed Property is effectively a mini schema which defines how SharePoint should treat one or more Crawled Property. Often, a Managed Property is mapped to only one Crawled Property, but this doesn’t need to be the case since it’s actually a one-to-many relationship. Think of a Managed Property as detailed metadata for a Crawled Property.

The most significant thing about Managed Properties, though, is that it causes the data in a Crawled Property to be actually stored in the Index. If there isn’t a Managed Property for a Crawled Property, as far as the search consumer is concerned, the data might as well not exist. Hopefully you can see how important Managed Properties are for the user’s search experience. Take a look at this TechNet article if you’d like to know which Managed Properties are included out of the box in SharePoint 2013.

Managed Properties were pretty simple and basic in SharePoint 2010, but they take on far more depth and functionality in SharePoint 2013. In 2010, they’re not a whole lot more than a mapping of Crawled Properties to more friendly names (and, of course, storage in the Index). In 2013, though, you get to control such things as the data type, whether or not the data should be searchable, whether or not the data can be retrieved (you may want to search on a property but, because it’s sensitive, not allow users to see it), whether or not the Managed Property can be used as a refiner in search results, whether or not the Managed Property can be used to sort search results, and more. There’s really a lot more you can do with Managed Properties in SharePoint 2013.

Managed Properties are not usable, though, until there is a full crawl. Whenever you create a new one, you need to do a full crawl. Since full crawls can be very painful (and potentially even prohibitive), it is not uncommon for organizations to hold off on creating Managed Properties and batch them up for creation at one time. Some organizations even require that a requestor fully justify the need for a new Managed Property.

So What About All Those Crawled Properties?

In any Search Service Application (SSA) very few Crawled Properties are actually mapped to Managed Properties. In my experience, generally about 20% are mapped. Additionally, there likely are some Managed Properties which exist but are not usable by your users (either not Queryable or not Retrievable). Aren’t you curious about what’s out there in these Crawled Properties? Is there something out there that could be really useful and deliver more business value? It can take an awful amount of work or data knowledge to answer these questions. Export-SPSearchResultsWithAllManagedProperties (see Part 1 of this series) could be useful here, but it’s only going to show the data in existing Managed Properties.

Until now.

Ok, perhaps that’s overly dramatic. But I created a set of PowerShell scripts/functions which can be used in conjunction with Export-SPSearchResultsWithAllManagedProperties to get you all of this data. But first a warning.

IMPORTANT WARNING!

There is good reason why not every Crawled Property is made into a Managed Property. Each Managed Property comes at a cost. When the Crawler processes an item, it needs to send all of the Managed Property data to the Index component. This adds further load on the Crawler and can slow it down. Once the Index gets it, it then needs to actually store the data. That also adds more overhead. On top of that, the data actually needs to be stored in the Index. Depending on how many Managed Properties you have and how many items you’ve got in your Index, that can substantially increase the amount of disk you need. THEREFORE, use the scripts wisely. I do not recommend that you use them on Production. The scripts create a new Managed Property for every unmapped Crawled Property, and it makes a new Managed Property for every existing Managed Property that isn’t currently marked as Retrievable or Queryable. That’s a lot of new Managed Properties. Also, whenever Managed Properties are added or removed, it’s necessary to then do a full crawl of all of your content in order to get the data into the index. Full crawls are not to be taken lightly, especially if there are many items to crawl. So that’s a total of two full crawls (one to add the properties, one to remove them). So I repeat: Do not use these scripts in a production environment unless you actually know what you’re doing and are willing to bear the consequences. You have been warned.

Hopefully you have been sufficiently frightened. Keep reading, though, if you’re still interested in giving this a try.

In order to get all of these Crawled Properties into the Index and thus usable by your search, you need to map them to Managed Properties. This can be prohibitively painful to do by hand. But PowerShell comes to our rescue once again. I created a PowerShell function, Promote-SPUnmappedCrawledPropertiesTemporarily, which takes all of the unmapped Crawled Properties and creates Managed Properties out of them. It also finds all Managed Properties which are not Queryable or Retrievable and creates new Managed Properties with those attributes enabled. Again, use this function with great care! After the script is finished, kick off a full crawl and wait for it to complete. Once it’s done, you can then use the Export-SPSearchResultsWithAllManagedProperties function to export to a CSV file for analysis the search results with all of the Managed Properties. Afterwards, you can use the second function, Demote-SPUnmappedCrawledPropertiesTemporarily, to remove the Managed Properties created previously. Finally, run another full crawl to remove all of those temporary Managed Properties.

Specifically, Promote-SPUnmappedCrawledPropertiesTemporarily does the following:

  • Get the Search Service Application (SSA) with the name provided to the function
  • Get all of the Crawled Properties in the SSA which are not mapped to a Managed Property
  • Get all of the Managed Properties in the SSA which are not Queryable or are not Retrievable
  • For each of the properties found above:
    • Create a Managed Property with a name like “.temporary./[CrawledCategoryName]/[CrawledPropertyName]”. The “/” will help you parse the name so that you can easily identify the Crawled Property when looking at the CSV. The Managed Property will be created as Multiple Lines of Text, Queryable, and Retrievable.
    • Map the Crawled Property to the new Managed Property

Demote-SPUnmappedCrawledPropertiesTemporarily undoes the work of the above function. Specifically, it does the following:

  • Get the SSA with the name provided to the function
  • Get all of the Managed Properties with a name that begins with “.temporary.”
  • For each of these:
    • Get all of the Crawled Properties mapped to the Managed Property (should be only one)
    • Un-map each of the mapped Crawled Properties
    • Remove the Managed Property

Again, after running Promote-SPUnmappedCrawledPropertiesTemporarily and completing a full crawl, use Export-SPSearchResultsWithAllManagedProperties to perform a search and export the data in the Managed Properties to a CSV file.

To use the functions, open a PowerShell window as Administrator and load the script like you would any other external script (. .\ManageSPSearchResultsWithAllManagedProperties.ps1). Don’t forget the dots and the space. Once the script is loaded, simply call it like any other PowerShell function. Call each function and pass it the SSA name (as you see it in the Manage Service Applications page in Central Administration). For help further help, just use the standard PowerShell Get-Help cmdlet (including the –examples, -details, and –full switches) from a PowerShell command line.

ManageSPSearchResultsWithAllManagedProperties.ps1

Here are the functions. Just copy and paste and save the following as a .ps1 file or just paste it into a PowerShell window.

 

Conclusion

Armed with the scripts from both parts of this series, hopefully you are now well-equipped to take greater control of your Index and maximize the value it has for your users. I highly suspect that you will find items in these properties which you can use to great advantage. If you did, please post your discoveries in the comments below. We’d all love to know if there are any really useful Crawled Properties that we should be adding to our Managed Properties.

I really hope you found this little series helpful. Thanks for reading!

 

 

 

 

 

Disclaimer
The sample scripts are not supported under any Summit 7 Systems standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Summit 7 Systems further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Summit 7 Systems, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Summit 7 Systems has been advised of the possibility of such damages.

Posted by on Monday, March 10th, 2014  

Subscribe to RSS Feed

Sign Up for Newsletter

Leave a Reply