Understanding the syntax of Drupal migrations
In the 31 days of Drupal migrations series, we explained different aspects of the syntax used by the Migrate API. In today’s article, we are going to dive deeper to understand how the API interprets our migration definition files. We will explain how to configure process plugins and set subfields and deltas for multi-value field migrations. We will also talk about process plugin chains, source constants, pseudofields, and the process pipeline. After reading this article, you will better comprehend existing migration definition files and improve your own. Let’s get started.
Field mappings: process plugin configuration
The Migrate API provides syntactic sugar to make migration definition files more readable. The field mappings under the
process section are a good example of this. To demonstrate the syntax consider a multi-value Link field to store links to online profiles. The field machine name is
field_online_profiles and it is configured to accept the URL and the link text. For brevity, only the `process` section will be shown, but it is assumed that the source includes the following columns: `source_drupal_profile`, `source_gitlab_profile`, and `source_github_profile`.
process: field_online_profiles: source_drupal_profile
In this case, we are directly assigning the value from
source_drupal_profile in the
source to the
field_online_profiles in the
destination entity. For now, we are ignoring the fact that the field accepts multiple values. We are setting the link text either, just the URL. Even in this example, the Migrate API is making some assumptions for us. Every field mapping requires at least one
process plugin to be configured. If none is set, the
get plugin is assumed. It copies a value from the source to the destination without making any changes. The previous snippet is equivalent to the next one:
process: field_online_profiles: plugin: get source: source_drupal_profile
process plugin configuration options should be placed as direct children of the field that is being mapped. In the previous snippet,
source are indented one level to the right under
field_online_profiles. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
process: destination_field: plugin: plugin_name config_1: value_1 config_2: value_2 config_3: value_3
Check out the article on using process plugins for data transformation for a working example.
Field mappings: setting sub-fields
Let's expand the example by setting the a value for the Link text in addition to the URL. To accomplish this, we will migrate data into subfields. Fields can store complex data and in many cases they have multiple components. For example, a rich text field has a subfield to store the text value and another for the text format. Address fields have 13 subfields available. Our example uses Link fields which have three subfields:
uri: The URI of the link.
title: The link text.
options: Serialized array of options for the link.
For now, only the
title subfields will be set. This also demonstrates that, depending on the field, it is not necessary to provide values for all the subfields. One more thing we will implement is to include the name of the online profile in the Link text. For example: “Drupal.org profile”.
process: field_online_profiles/uri: source_drupal_profile field_online_profiles/title: plugin: default_value default_value: 'Drupal.org profile'
If you want to set a value for a subfield, you use the
field_name/subfield syntax. Then, each subfield can define its own mapping. Note that when setting the
uri we are taking advantage of the
get plugin considered the default to simplify the value assignment. In the case of
default_value process plugin is used to set a fixed value to comply with our example requirement.
When setting subfields, it is very important to understand what format is expected. You need to make sure the process plugins return data in the expected format or the migration will fail. In particular, you need to know if they return a scalar value or an array. In the case of scalar values, you need to verify if numbers or strings are expected. In the previous example, the
uri subfield of the Link field expects a string containing the URL. On the other hand, File fields have a
target_id subfield that expects an integer representing the File ID that is being referenced. Some process plugins might return an array or let you set subfields directly as part of the plugin configuration. For an example of the latter, have a look at the article on migrating images using the image_import plugin.
image_import lets you set the
height subfields for images directly in the plugin configuration. The following snippets shows a generalization for setting subfields:
process: destination_field/subfield_1: plugin: plugin_name config_1: value_1 config_2: value_2 destination_field/subfield_2: plugin: plugin_name config_1: value_1 config_2: value_2
If a field can have multiple subfields, how can I know which ones are available? For easy reference, our next blog post will include a list of subfields for different types of fields. To find out by yourself, check out this article that covers available subfields. In summary, you need to locate the class that provides the
FieldType plugin and inspect its
schema method. The latter defines the database columns used by the field to store its data. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. Link fields are provided by the
LinkItem class whose
schema method defines the three subfields we listed before.
If a field can have multiple subfields, how does the Migrate API know which one to set when no one is manually specified? Every Drupal field has at least one subfield. If they have more, the field type itself specifies which one is the default. For easy reference, our next blog post will indicate the default subfield for different types of fields. To find out by yourself, check out this article that covers default subfields. In summary, you need to locate the class that provides the
FieldType plugin and inspect its
mainPropertyName method. Its return value will be the default subfield used by the Migrate API. Because of object oriented practices, sometimes you need to look at the parent class to find the method that defines the default subfield. Link fields are provided by the
LinkItem class whose
uri. That is why in the first example there was no need to specify a subfield to set the value for the link URL.
Field mappings: setting deltas for multi-value fields
Once more, let’s expand the example by setting the populating multiple values for the same field. To accomplish this, we will specify field deltas. A delta is a numeric index starting at 0 and incrementing by 1 for each subsequent element in the multi-value field. Remember that our example assumes that the
source has the following columns:
source_github_profile. One way to migrate all of them into the multi-value link field is:
process: field_online_profiles/0/uri: source_drupal_profile field_online_profiles/0/title: plugin: default_value default_value: 'Drupal.org profile' field_online_profiles/1/uri: source_gitlab_profile field_online_profiles/1/title: plugin: default_value default_value: 'GitLab profile' field_online_profiles/2/uri: source_github_profile field_online_profiles/2/title: plugin: default_value default_value: 'GitHub profile'
If you want to set a value for a subfield, you use the
field_name/delta/subfield syntax. Then, every combination of delta and subfield can define its own mapping. Both
subfield are optional. If no delta is specified, 0 is assumed which corresponds to the first element of a (multi-value) field. If no
subfield is specified, the default subfield is assumed as explained before. In the previous example, if there is no need to set the link text the configuration would become:
process: field_online_profiles/0: source_drupal_profile field_online_profiles/1: source_gitlab_profile field_online_profiles/2: source_github_profile
In this example, we wanted to highlight syntax variations that can be used with the Migrate API. Nevertheless, this way of migrating multi-value fields is not very flexible. You are required to know in advance how many deltas you want to migrate. Depending on your particular configurations, you can write complex process pipelines that take into account an unknown number of deltas. Sometimes, writing a custom migration process plugin is easier and/or the only option to accomplish a task. Even if you can write a migration with existing process plugins, that might not be the best solution. When writing migrations, strive for them to be easy to read, understand, and maintain. For reference, the generic configuration for mapping fields with deltas and subfields is:
process: destination_field/0/subfield_1: plugin: plugin_name config_1: value_1 config_2: value_2 destination_field/0/subfield_2: plugin: plugin_name config_1: value_1 config_2: value_2 destination_field/1/subfield_1: plugin: plugin_name config_1: value_1 config_2: value_2 destination_field/1/subfield_2: plugin: plugin_name config_1: value_1 config_2: value_2
Process plugin chains
So far, for every
field_name/delta/subfield combination we only have used one process plugin. The Migrate API does not impose any restrictions to the number of transformations that the source data can undergo before being assigned to a destination property or field. You can have as many as needed. Chaining of process plugins works similarly to Unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned. We have covered this topic in greater detail in the article on using process plugins for data transformation. For now, let’s consider an example chain of two process plugins:
process: title: - plugin: concat source: - source_first_name - source_last_name delimiter: ' ' - plugin: callback callable: strtoupper
In this example, we are using the
concat plugin to glue together the
source_last_name. A space is placed in between as specified by the
delimiter configuration. The result of this is later passed to the
callback plugin which executes the
strtoupper PHP function on the concatenated value effectively making the string uppercase. Because there are no more process plugins in the chain, the string transformed to uppercase is assigned to the
title destination property. If
source_first_name is ‘Mauricio’ and
source_last_name is ‘Dinarte’, then
title would be set to ‘MAURICIO DINARTE’. Refer to the article mentioned before for other things to consider when manipulating strings. The configuration of process plugin chains can be generalized as follows:
process: destination_field: - plugin: plugin_name source: source_column_name config_1: value_1 config_2: value_2 - plugin: plugin_name config_1: value_1 config_2: value_2 - plugin: plugin_name config_1: value_1 config_2: value_2
It is very important to note that only the first process plugin in the chain should set a
source configuration. Remember that the output of the previous process plugin is the input for the next one. Setting the
source configuration in subsequent process plugins is unnecessary and can actually make the chain produce unexpected results or fail altogether.
Source constants, pseudofields, and the process pipeline
We have covered source constants, pseudo-fields, and the process pipeline in the article on using data placeholders in the migration process. This time, we are only going to give an overview to explain their syntax. Constants are arbitrary values that can be used later in the process pipeline. They are set as direct children of the
source section. Let’s consider this example:
source: constant: DRUPAL_LINK_TITLE: 'Drupal.org profile' GITLAB_LINK_TITLE: 'GitLab profile' GITHUB_LINK_TITLE: 'GitHub profile' process: field_online_profiles/0/uri: source_drupal_profile field_online_profiles/0/title: constant/DRUPAL_LINK_TITLE field_online_profiles/1/uri: source_gitlab_profile field_online_profiles/1/title: constant/GITLAB_LINK_TITLE field_online_profiles/2/uri: source_github_profile field_online_profiles/2/title: constant/GITHUB_LINK_TITLE
To define source constants, you write a
constants key and set its value to an array of name-value pairs. When you need to refer to them in the
process section, you use
constant/NAME and they behave like any other column present in the source. Although not required, it is customary to name constants in uppercase. This makes it easier to distinguish them from regular source columns. Notice how their use makes assigning the link titles simpler. Instead of using the
default_value plugin, we read the value directly from the source constants.
Pseudofields also store arbitrary values for use later, but they are defined in the
process section. Their names can be arbitrary as long as they do not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the source (a column or a constant) or they can use process plugins for data transformations. For the next example, consider that there is no need for the link text to be different among online profiles. Additionally, there is another Link field that can only store one value. This new field is used to store the URL to the primary profile. The example can be rewritten as follows:
source: constant: LINK_TITLE: 'Online profile' process: pseudo_link_text: - plugin: get source: constant/LINK_TITLE - plugin: callback callable: strtoupper field_online_profiles/0/uri: source_drupal_profile field_online_profiles/0/title: '@pseudo_link_text' field_online_profiles/1/uri: source_gitlab_profile field_online_profiles/1/title: '@pseudo_link_text' field_online_profiles/2/uri: source_github_profile field_online_profiles/2/title: '@pseudo_link_text' field_primary_profile: '@field_online_profiles/0'
A psedofield named
pseudo_link_text has been created. It has its own process pipeline to provide the link text that will be used for all online profiles. When you want to use the pseudo, you have to enclose it in quotes (') and prepend an at sign (@) to the name. The
pseudo_ prefix in the name is not required. In this case it is used to make it easier to distinguish among pseudofields and regular property or field names.
The previous snippets is also a good example of how the migrate process pipeline works. When setting
field_primary_profile, we are reusing a value stored in another field: the first delta of
field_online_profiles. There are many things to note here:
- The migrate process pipeline lets you reuse anything that has been defined previously in the file. It can be source constants, pseudo fields, or regular destination properties and fields. The only requirement is that whatever you want to use needs to be previously defined in the migration definition file.
- Source columns are accessed directly by name. Source constants are accessed using the
- Any element defined in the
processsection can be reused later in the process pipeline by enclosing its name in quotes (') and prepending an at sign (@). This applies to pseudofields and regular destination properties and fields.
When reusing an element in the process pipeline, its whole structure becomes available. In the previous example, we set
'@field_online_profiles/0'. This means that all subfields in the first delta of the
field_online_profiles field will be assigned to
field_primary_profile. Effectively this means both the
title properties will be set. Be mindful that when you reuse a field, all its delta and subfields are copied along unless specifically restricted. For example, if you only want to reuse the
uri of the first delta you would use
'@field_online_profiles/0/uri'. In none of these scenarios, indicating that you want to reuse something guarantees that it will be stored in the new element assignment. For example, the
field_primary_profile field only accepts one value. Even if we used
'@field_online_profiles' to reuse all the deltas of the multi-value field, only the first one will be stored per the field's (cardinality) definition.
The Migrate API is pretty flexible and you can write very complex process pipelines. The examples we have presented today have been exaggerated to demonstrate many syntax variations. Again, when writing migrations, strive for process pipelines that are easy to read, understand, and maintain.
What did you learn in today's article? Did you know that it is possible to specify deltas and subfields in field mappings? Were you aware that process plugins can be chained for multiple data transformations? How have you used source constants and psuedofield before? Please share your answers in the comments. Also, we would be grateful if you shared this article with your friends and colleagues.
Sign up to be notified when Agaric gives a migration training:
Add new comment