Using process plugins for data transformation in Drupal migrations
In the previous entry, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination or to meet business requirements. Today we will learn more about process plugins and how they work as part of the Drupal migration pipeline.
The Migrate API offers a lot of syntactic sugar to make it easier to write migration definition files. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the
get plugin is assumed. The following two code snippets are equivalent in functionality.
process: title: creative_title
process: title: plugin: get source: creative_title
get process plugin simply copies a value from the source to the destination without making any changes. Because this is a common operation,
get is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
process: destination_field: plugin: plugin_name config_1: value_1 config_2: value_2 config_3: value_3
The process plugin is configured within an extra level of indentation under the destination field. The
plugin key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the
concat plugin requires a
source, but the
delimiter is optional. An example of its use appears later in this entry.
Providing default values
Sometimes, the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. As we have mentioned, it is recommended to write one migration file per content type. If you know in advance that for a particular migration you will always create nodes of type
Basic page, then it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the
default_value plugin can be used to provide a value when the data is not available in the source.
source: ... process: type: plugin: default_value default_value: page destination: plugin: 'entity:node'
The above example sets the
type property for all nodes in this migration to
page, which is the machine name of the
Basic page content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same:
default_value. Also note that because a (content)
type is manually set in the process section, the
default_bundle key in the destination section is no longer required. You can see the latter being used in the example of writing your Drupal migration blog post.
Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type
Basic page. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example:
FELIX DELATTRE would be transformed to
Tip: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested
DAMIEN MCKENNA would be transformed to
Damien Mckenna. That is not the correct capitalization for the last name
McKenna. If automatic transformation is not possible or feasible for all variations of the source data, take notes and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client’s attention.
To implement this feature, let’s create a new module
ud_migrations_process_intro, create a
migrations folder, and write a migration definition file called
udm_process_intro.yml inside it. Follow the instructions in this entry to find the proper location and folder structure or download the sample module from https://github.com/dinarcon/ud_migrations It is the one named
UD Process Plugins Introduction and machine name
udm_process_intro. For this example, we assume a Drupal installation using the
standard installation profile which comes with the
Basic Page content type. Let’s see how to handle the concatenation of first an last name.
id: udm_process_intro label: 'UD Process Plugins Introduction' source: plugin: embedded_data data_rows: - unique_id: 1 first_name: 'FELIX' last_name: 'DELATTRE' - unique_id: 2 first_name: 'BENJAMIN' last_name: 'MELANÇON' - unique_id: 3 first_name: 'STEFAN' last_name: 'FREUDENBERG' ids: unique_id: type: integer process: type: plugin: default_value default_value: page title: plugin: concat source: - first_name - last_name delimiter: ' ' destination: plugin: 'entity:node'
concat plugin can be used to glue together an arbitrary number of strings. Its
source property contains an array of all the values that you want put together. The
delimiter is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an important limitation. You cannot use strings literals as part of what you want to concatenate. For example, joining the string
Hello with the value of the
first_name column. All the values to concatenate need to be columns in the source or fields already available in the process pipeline. We will talk about the latter in a future blog post.
To execute the above migration, you need to enable the
ud_migrations_process_intro module. Assuming you have Migrate Run installed, open a terminal, switch directories to your Drupal docroot, and execute the following command:
drush migrate:import udm_process_intro Refer to this entry if the migration fails. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks
Chaining process plugins
Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows chaining of process plugins. This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let’s see this in action:
id: udm_process_intro label: 'UD Process Plugins Introduction' source: ... process: type: ... title: - plugin: concat source: - first_name - last_name delimiter: ' ' - plugin: callback callable: mb_strtolower - plugin: callback callable: ucwords destination: ...
callback process plugin pass a value to a PHP function and returns its result. The function to call is specified in the
callable configuration option. Note that this plugin expects a
source option containing a column from the source or value of the process pipeline. That value is sent as the first argument to the function. Because we are using the
callback plugin as part of a chain, the source is assumed to be the last output of the previous plugin. Hence, there is no need to define a
source. So, we concatenate the columns, make them all lowercase, and then capitalize each word.
Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulates your business logic separate of the migration definition. The
callback plugin comes with its own limitation. For example, you cannot pass extra parameters to the
callable function. It will receive the specified value as its first argument and nothing else. In the above example, we could combine the calls to mb_strtolower() and
ucwords() into a single call to mb_convert_case($source, MB_CASE_TITLE) if passing extra parameters were allowed.
Tip: You should have a good understanding of your source and destination formats. In this example, one of the values to want to transform is
MELANÇON. Because of the cedilla (ç) using strtolower() is not adequate in this case since it would leave that character uppercase (
melanÇon). Multibyte string functions (
mb_*) are required for proper transformation.
ucwords() is not one of them and would present similar issues if the first letter of the words are special characters. Attention should be given to the character encoding of the tables in your destination database.
mb_strtolower is a function provided by the
mbstring PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available:
The "callable" must be a valid function or method. For Drupal and this particular function that error would never be triggered, even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the
symfony/polyfill-mbstring package. The latter provides a polyfill) for
mb_* functions that has been leveraged since version 8.6.x of Drupal.
What did you learn in today’s blog post? Did you know that syntactic sugar allows you to write shorter plugin definitions? Were you aware of process plugin chaining to perform multiple transformations over the same data? Had you considered character encoding on the source and destination when planning your migrations? Are you making your best effort to avoid the
callback process plugin? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with your colleagues.
This blog post series, cross-posted at UnderstandDrupal.com as well as here on Agaric.coop, is made possible thanks to these generous sponsors. Contact Understand Drupal if your organization would like to support this documentation project, whether is the migration series or other topics.