Declarative Beats Imperative: Transforming Data In JavaScript

Jakub HolyApril 10th, 2015Last Updated: April 6th, 2015

0 66 3 minutes read

The product I am working on is a fairly typical web application, based on React.js and Node.js. Its main task is to fetch data from different REST-ish services, combine them, and present them in an attractive manner to users.

This data extraction and transformation – keeping only the items and properties we care about and structuring them in a way well aligned with the needs of the UI – is thus crucial. I would like to share our journey from an imperative implementation of this process to a much more reader-friendly declarative one.

For example, here is some JSON data we get back from a REST service:

[{
    productId: '42',
    type: 'TEAPOT',
    subProducts: [{
        productNumber: 'ch132',
        name: 'Kyusu Teapot',
        dealerDetails: [
          { dealerGroup: 'webshop', rank: 1 }
        ]}]}];

And here is what we would like to transform it into:

{
  '42': {
    'productId': '42',
    'variations': {
      'ch132': {
        'productNumber': 'ch132',
        'name': 'Kyusu Teapot',
      }
    },
    'selectedSubProductId': 'ch132'
  }}

My first implementation was functional, using the wonderful functional library lodash, but rather imperative. We had a series of functions that transformed some vague JavaScript object / JSON into something else. It looked something like this:

_.chain(data)
	.groupBy('productId')
	.mapValues(_.flow(_.first, transformProduct))
	.value();

function transformProduct(product) {
    return {
        productId: product.productId,
        variations: _.chain(product.subProducts)
            .groupBy('productNumber')
            .mapValues(_.flow(_.first, transformSubProduct)) // transformSubProduct not shown
            .value();
    };
}
// + more stuff left out

The problem was that it was difficult to see the structure of both the input data and transformed output data. Without a good knowledge of the structure of the data being processed and the desired outcome, it was difficult to understand the code. It was also difficult to add additional processing, such as the generation of a “transformation audit log,” to the process.

Then a smart colleague of mine, Alex York, did a part of the transformation in a nice, declarative way. Inspired, I reimplemented the whole thing to be much more declarative. The core is a nested data structure resembling closely the desired output data with the addition of processing instructions. Here is a simplified example:

var productCatalogTransformationDefinition = {
    type: 'map',
    singleGroupBy: 'productId',
    elementValue: {
        productId: 'productId',
        variations: {
            selector: 'subProducts',
            singleGroupBy: 'productNumber',
            elementValue: {
                productNumber: 'productNumber',
                name: 'name',
                images: {
                    selector: 'additionalDetailsList',
                    filter: function(it) { return it.name.search(/^IMAGE\d+$/) === 0; },
                    elementValue: 'value'
                }
            }
        }
// + function transform(rawData, transformationDefinition) - see
//   the source code linked below

As you can see, each output value is specified by an object that defines what operations (based on lodash) to perform on the current “sourceElement” to produce it. It is our transformer module which uses this “transformDefinition” along with the JSON input data (typically from a REST service) which performs the task of turning the input data into the desired output data structure.

Pros & Cons

Pros:

Explicit output data form => easier to understand code
Declarative processing enables us to add additional functionality to it such as audit log

Cons:

The code is longer (but the difference gets smaller as we do more or more complex transformations)
The declarative transformations are much less flexible than manual transform code (but good for what we need); for example currently it only supports transforming all elements of a collection in the same way and not e.g. based on their type. (Though we could introduce some kind of an “unionElement” for that.)

Conclusion

The declarative transformation is much easier to understand. It is also fairly easy to add additional functionality, e.g. to log the data subset being processed at a particular stage when troubleshooting or to create an audit log of the processing so that missing or bad output data can easily be traced to the cause.

There is of course much less freedom in what can be done in the context of a transformDefinition than if it was manual transform code but it worked well for us and, after all, it is quite easy to extend the format.

An unresolved issue is that the form of the input data is still only implicit. That could be mitigated by describing it with JSON Schema http://json-schema.org/ but having examples of the full input data in our tests is so far sufficient for us.

I would like to thank Alex York for his valuable feedback.

References

I found surprisingly little resources dealing with declarative data transformation either on the theoretical or practical level (any pointers are highly welcomed!). I know about XSLT and how some Extract-Transform-Load (ETL) tools work but there certainly must be a theoretical body of knowledge behind this.

Regarding JavaScript/JSON in particular, JSONiq – “the JSON query language,” a transformation-oriented superset of JSON (examples) is very interesting. Brett Zamir has a proof of concept for XSL-like JavaScript class to transform HTML. Lastly there is the jsonpath-object-transform package that “pulls data from an object literal using JSONPath and generate a new objects based on a template” – this is most similar to my solution, only based on JSONPath rather than lodash functions.