Skip to main content

Migrating taxonomy terms and multivalue fields in Drupal

Today we continue the conversation about migration dependencies with a hierarchical taxonomy terms example. Along the way, we will present the process and syntax for migrating into multivalue fields. The example consists of two separate migrations. One to import taxonomy terms accounting for term hierarchy. And another to import into a multivalue taxonomy term field. Following this approach, any node and taxonomy term created by the migration process will be removed from the system upon rollback.

Syntax for multivalue field migration.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD multivalue taxonomy terms whose machine name is ud_migrations_multivalue_terms. The two migrations to execute are udm_dependencies_multivalue_term and udm_dependencies_multivalue_node. Notice that both migrations belong to the same module. Refer to this article to learn where the module should be placed.

The example assumes Drupal was installed using the standard installation profile. Particularly, a Tags (tags) taxonomy vocabulary, an Article (article) content type, and a Tags (field_tags) field that accepts multiple values. The words in parenthesis represent the machine name of each element.

Migrating taxonomy terms and their hierarchy

The example data for the taxonomy terms migration is fruits and fruit varieties. Each row will contain the name and description of the fruit. Additionally, it is possible to define a parent term to establish hierarchy. For example, “Red grape” is a child of “Grape”. Note that no numerical identifier is provided. Instead, the value of the name is used as a string identifier for the migration. If term names could change over time, it is recommended to have another column that did not change (e.g., an autoincrementing number). The following snippet shows how the source section is configured:

source:
  plugin: embedded_data
  data_rows:
    - fruit_name: 'Grape'
      fruit_description: 'Eat fresh or prepare some jelly.'
    - fruit_name: 'Red grape'
      fruit_description: 'Sweet grape'
      fruit_parent: 'Grape'
    - fruit_name: 'Pear'
      fruit_description: 'Eat fresh or prepare a jam.'
  ids:
    fruit_name:
      type: string

The destination is quite short. The target entity is set to taxonomy terms. Additionally, you indicate which vocabulary to migrate into. If you have terms that would be stored in different vocabularies, you can use the vid property in the process section to assign the target vocabulary. If you write to a single one, the default_bundle key in the destination can be used instead. The following snippet shows how the destination section is configured:

destination:
  plugin: 'entity:taxonomy_term'
  default_bundle: tags

For the process section, three entity properties are set: name, description, and parent. The first two are strings copied directly from the source. In the case of parent, it is an entity reference to another taxonomy term. It stores the taxonomy term id (tid) of the parent term. To assign its value, the migration_lookup plugin is configured similar to the previous example. The difference is that, in this case, the migration to reference is the same one being defined. This sets an important consideration. Parent terms should be migrated before their children. This way, they can be found by the lookup operation. Also note that the lookup value is the term name itself, because that is what this migration set as the unique identifier in the source section. The following snippet shows how the process section is configured:

process:
  name: fruit_name
  description: fruit_description
  parent:
    plugin: migration_lookup
    migration: udm_dependencies_multivalue_term
    source: fruit_parent

Technical note: The taxonomy term entity contains other properties you can write to. For a list of available options check the baseFieldDefinitions() method of the Term class defining the entity. Note that more properties can be available up in the class hierarchy.

Migrating multivalue taxonomy terms fields

The next step is to create a node migration that can write to a multivalue taxonomy term field. To stay on point, only one more field will be set: the title, which is required by the node entity. Read this change record for more information on how the Migrate API processes Entity API validation. The following snippet shows how the source section is configured for the node migration:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      thoughtful_title: 'Amazing recipe'
      fruit_list: 'Green apple, Banana, Pear'
    - unique_id: 2
      thoughtful_title: 'Fruit-less recipe'
  ids:
    unique_id:
      type: integer

The fruits column contains a comma separated list of taxonomies to apply. Note that the values match the unique identifiers of the taxonomy term migration. If you had used numbers as migration identifiers there, you would have to use those numbers in this migration to refer to the terms. An example of that was presented in the previous post. Also note that there is one record that has no terms associated. This will be considered during the field mapping. The following snippet shows how the process section is configured for the node migration:

process:
  title: thoughtful_title
  field_tags:
    - plugin: skip_on_empty
      source: fruit_list
      method: process
      message: 'Row does not contain fruit_list.'
    - plugin: explode
      delimiter: ','
    - plugin: callback
      callable: trim
    - plugin: migration_lookup
      migration: udm_dependencies_multivalue_term
      no_stub: true

The title of the node is a verbatim copy of the thoughtful_title column. The Tags fields, mapped using its machine name field_tags, uses three chained process plugins. The skip_on_empty plugin reads the value of the fruit_list column and skips the processing of this field if no value is provided. This is done to accommodate the fact that some records in the source do not specify tags. Note that the method configuration key is set to process. This indicates that only this field should be skipped and not the entire record. Ultimately, tags are optional in this context and nodes should still be imported even if no tag is associated.

The explode plugin allows you to break a string into an array, using a delimiter to determine where to make the cut. Later, the callback plugin will use the trim PHP function to remove any space from the start or end of the exploded taxonomy term name. Finally, this array is passed to the migration_lookup plugin specifying the term migration as the one to use for the lookup operation. Again, the taxonomy term names are used here because they are the unique identifiers of the term migration. The `no_stub` configuration should be set to `true` to prevent terms to be created if they are not found by the plugin. This would not occur in the example because we make sure a match is found. If we did not set this configuration and we do not include the trim step, some new terms would be created with spaces at the beginning. Note that neither of these plugins has a source configuration. This is because when process plugins are chained, the result of one plugin is sent as source to be transformed by the next one in line. The end result is an array of taxonomy term ids that will be assigned to field_tags. The migration_lookup is able to process single values and arrays.

The last part of the migration is specifying the process section and any dependencies. Refer to this article for more details on setting migration dependencies. The following snippet shows how both are configured for the node migration:

destination:
  plugin: 'entity:node'
  default_bundle: article
migration_dependencies:
  required:
    - udm_dependencies_multivalue_term
  optional: []

More syntactic sugar

One way to set multivalue fields in Drupal migrations is assigning its value to an array. Another option is to set each value manually using field deltas. Deltas are integer numbers starting with zero (0) and incrementing by one (1) for each element of a multivalue field. Although you could set any delta in the Migrate API, consider the field definition in Drupal. It is possible that limits had been set to the number of values a field could hold. You can specify deltas and subfields at the same time. The full syntax is field_name/field_delta/subfield. The following example shows the syntax for a multivalue image field:

process:
  field_photos/0/target_id: source_fid_first
  field_photos/0/alt: source_alt_first
  field_photos/1/target_id: source_fid_second
  field_photos/1/alt: source_alt_second
  field_photos/2/target_id: source_fid_third
  field_photos/2/alt: source_alt_third

Manually setting a multivalue field is less flexible and error-prone. In today’s example, we showed how to accommodate for the list of terms not being provided. Imagine having to that for each delta and subfield combination, but the functionality is there in case you need it. In the end, Drupal offers more syntactic sugar so you can write shorted field mappings. Additionally, there are various process plugins that can handle arrays for setting multivalue fields.

Note: There are other ways to migrate multivalue fields. For example, when using the entity_generate plugin provided by Migrate Plus, there is no need to create a separate taxonomy term migration. This plugin is able to create the terms on the fly while running the import process. The caveat is that terms created this way are not deleted upon rollback.

What did you learn in today’s blog post? Have you ever done a taxonomy term migration before? Were you aware of how to migrate hierarchical entities? Did you know you can manually import multivalue fields using deltas? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

Next: Migrating users into Drupal - Part 1

This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether it is the migration series or other topics.

Comments

2020 May 22
Renaud

Permalink

Hello Mauricio, In 09, the…

Hello Mauricio,

In 09, the udm_dependencies_multivalue_term migration works as expected and produces a hierarchical taxonomy tree. But the udm_dependencies_multivalue_node migration does not use these terns, adding instead its own terms to the tags vocabulary, creating 5 duplicate terms in the process.

I spent a few hours trying to debug & fix this but I was not successful.

Renaud

2020 May 23
Renaud

Permalink

Got it! Solved. In udm…

Got it! Solved.

In udm_dependencies_multivalue_node:28 the delimiter was missing a space after the comma.

  • before: delimiter: ','
  • after: delimiter: ', '

p.s. On that page I'm counting 7 occurrences of <code> something</code>.

2020 May 27
Renaud

Permalink

I was wondering why 5 tags…

I was wondering why 5 tags were duplicated on import using the given code. I realized that this was because of leading spaces for nodes with multiple tags. Changing the delimiter to ', ' fixed my import issue.

But in the next tutorial I discovered the better/proper way to process these unwanted spaces and that is by adding a callback plugin after the delimiter with a trim function.

- plugin: explode
  delimiter: ','
- plugin: callback
  callable: trim

Now the import works and the migration file is proper. :)

2020 May 28
Mauricio Dinarte

Permalink

Hello Renaud, thank you very…

Hello Renaud, thank you very much for your comments!

You are totally right, we failed to include the necessary trim as part of the process pipeline. We actually do this at https://agaric.coop/blog/understanding-entitylookup-and-entitygenerate-…

The extra taxonomy terms are created as a result of two things:

  1. By not trimming the result of the explode operation, you would get some terms with a space at the beginning and those would not be match by the migration_lookup.
  2. If the migration_lookup does not find an entity, it creates a stub by default. In this case, it would effectively create a new taxonomy term when the intended behavior is to only make the assignment if a term already exist in the system from a previous migration. To prevent this, the no_stub key should be set to true.

We have updated the example code to make to include the trim and the no_stub key. By the way, we also changed one term in the node migration from 'Green grape' to 'Grape' because the former was not imported in the other migration.

Thanks for taking the time to debug the problem and share your solution. It would be good to know what steps you followed to debug the problem and how you fixed it. :)

The <code></code> texts were escaped HTML tags. It is a formatting issue we have also fixed.

For your great feedback over various articles, we would like to give you a free copy of the ebook version of the series https://udrupal.com/31m-book Feel free to reach out via our contact form at https://agaric.coop/contact

Add new comment

The content of this field is kept private and will not be shown publicly.

The comment language code.

Upcoming Trainings from Agaric