The Journey of a Data Scientist: Chemical Engineering Meets Data Science

4 min readJan 8, 2021

Chemical engineers have a unique blend of math, physics, and chemistry knowledge, and that intersection of expertise has many strengths. What they tend to lack is a strong background in how to use software to solve data problems. Because software best practices and statistical modeling in general are not heavily emphasized in the chemical engineering domain, collaborating with data scientists, and learning data science skills, is helpful.

When chemical engineers are missing data, it can lead to incorrect insights. Waste is a common consequence of incorrect insights. For example, a chemical process that is not running optimally wastes energy or heat, which equates to wasted money. Similarly, processing errors can waste materials, time, and financial resources.

Sometimes the stakes are higher. If something goes wrong in a chemical reaction or chemical process, it can have catastrophic safety impacts for the plant operator and potentially for the whole community. In biochemicals, a mistake in a drug, vaccine, clinical trial, or how a diagnostic tool is used can impact consumers’ lives.

Having a better understanding and application of data science can alleviate these concerns. Here are three practical examples:

Process optimization: The bread and butter of chemical engineering is knowing how to develop a process to make a product that solves some problem. Data science, machine learning (ML), and digital twins can help to optimize this process and improve fault detection.

For instance, in the pharmaceutical industry, defining the process window for formulating a particular product or drug is highly experiment- and resource-intensive, and involves numerous bioreactor runs. A well-defined window is needed to prove the process is stable despite any perturbations, all the product quality metrics will be within a certain range, and a certain yield will be achieved. Data science can accelerate this process by simulating it with a digital twin and applying predictive models to determine how the process window will be affected by the different inputs and choices made along the way.

Figure 1: Machine learning models can be used for fault detection and process optimization for typical chemical engineering components and processes.

Surrogate modeling: Physical models reflect how a part of a process, whether a component or piece of equipment, should work, but those based on first principles or some chemistry or physical calculation are limited by the validity of the assumptions. For example, a petrochemical plant’s physical model predicting the concentrations of impurities in a distillation column’s product stream will be limited by crude changes that invalidate the model’s assumptions.

With data science and ML, a surrogate model made from a physical model can begin with the original assumptions, and then look at the data and learn from it to become more accurate than the physical model. Once optimized, the ML surrogate model can replace the physical model.

Efficiency is another advantage of using surrogate models. To create an accurate physical model, it can take an expert a long time to write down all the equations and solve it. Also, physical models are often too computationally expensive to deploy in production. A surrogate model can be faster to implement and computationally quicker to run.

An example of this is when chemical engineers work to improve the efficiency of lithium ion batteries by changing the charging profile to increase their lifespan and capacity. Changing profiles to get more (or less) energy out of a battery, or to reduce its degradation, typically requires a complicated and computationally expensive physical model. A surrogate model can arrive at the same optimum charging profile as the physical model, but much faster.

Figure 2: Tignis surrogate model predicts wafer bow shape from stress film pattern to use for process optimization to improve yield in semiconductor manufacturing.

Materials optimization and selection: Chemical engineers in the research and development space are finding new materials to solve problems better. Whether developing vaccines, finding new therapeutics for cancer treatments, or identifying new materials for solar cells that can sustain higher voltage, there is a significant opportunity in R&D for data science and ML to aid in materials optimization and selection. The process of running experiments to make correlations between the different inputs to find the desired properties can be faster and more efficient with this technology.

Some of the most interesting innovations happening today are where domain experts such as chemical engineers are working together with data scientists. For example, simply by providing an ML model with data, the model can learn the Navier-Stokes equation that governs fluid mechanics, which can be used to describe the behavior of Newtonian fluids in any domain. Although the Navier-Stokes equation is well known, similar ML approaches can be used to identify fundamental equations for unsolved problems and to understand new phenomena.

Here at Tignis, rather than just using ML, we are adding in physical constraints to our models, which helps our customers by having models that evolve with the dataset and have higher accuracy with more limited data.

Deep technical expertise is crucial to the effective application of data science in chemical engineering. If a chemical engineer thinks there is a problem that can be solved with data science, a Tignis data scientist is able ask the correct questions of the data, choose the right type of model, choose the dataset that will provide the most benefit, and interpret the model to make sure it is making the right choices for the right reasons.

The Journey of a Data Scientist: Chemical Engineering Meets Data Science

Written by Tignis