PINNs_Inference

Posted Nov 4, 2023 Updated Nov 30, 2023

By YC 7 min read

Trained to solve supervised learning tasks, Physic-informed neural networks (PINNs) are defined for two classes of problems: data-driven solution and data-driven discovery of partial differential equations (PDEs).

Data-driven solution

Given a general form of PDEs $f(t,x)$, PINN could get to the solution $u(t,x)$ of this 1-dimensional non-linear equation with only small amount of training data, namely initial and boundary conditions of $u(t,x)$ as well as the sampled data of $f(t,x)$.

Continuous time inference equation: Burger’s equation

\[f(t,x)=u_{t}+uu_{xx}-(0.01/\pi)u_{xx}\]

By minimizing the mean square error loss $MSE_{u}+MSE_{f}$, including the data measurements of $u(t,x)$ at initial and boundary conditions, and the supervised loss of the Burger’s equation, $u(t,x)$ is accurately simulated by neural networks.

\[MSE_{u}=\frac{1}{N_{u}}\sum_{i=0}^{N_{u}}\lvert u(t_{u}^{i},x_{u}^{i})-u^{i} \rvert^2\] \[MSE_{f}=\frac{1}{N_{f}}\sum_{i=0}^{N_{f}}\lvert f(t_{f}^{i},x_{f}^{i}) \rvert^2\]

Steps are as follows.

Weights between layers are defined like xavier_init

        
      
def xavier_init(self, size):
     in_dim = size[0]
     out_dim = size[1]        
     xavier_stddev = np.sqrt(2/(in_dim + out_dim))
     return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)

Given the number of layers, weights and biases are produced by initialize_NN

        
      
def initialize_NN(self, layers):        
     weights = []
     biases = []
     num_layers = len(layers) 
     for l in range(0,num_layers-1):
         W = self.xavier_init(size=[layers[l], layers[l+1]])
         b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32)
         weights.append(W)
         biases.append(b)        
     return weights, biases

Given weights and biases, a complete neural network without the final activation is designed by neural_net

        
      
def neural_net(self, X, weights, biases):
     num_layers = len(weights) + 1
     H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
     for l in range(0,num_layers-2):
         W = weights[l]
         b = biases[l]
         H = tf.tanh(tf.add(tf.matmul(H, W), b))
     W = weights[-1]
     b = biases[-1]
     Y = tf.tanh(tf.add(tf.matmul(H, W), b)) # YC: tf.tanh was personally added.
     return Y

where, H seems like restricting the value of the first layer within [-1,1].

As weights, dummy_x0(x1)_tf is identical matirx of x0(x1) rows, and are thus replaced by x0(x1). Besides, to adjust to the tf 2.x framework, tf. gradients() was replaced with tf.GradientTape(), and gradient process was merged into the core part of representing loss functions in PINNs.

        
      
def net_U0(self, x): # YC: using GradientTape instead of gradients
     with tf.GradientTape() as g:
             g.watch(x)
             with tf. GradientTape() as gg:
                     gg.watch(x)
                     U1 = self.neural_net(x, self.weights, self.biases)
                     U = U1[:,:-1]
             U_x = gg.gradient(U, x)
     U_xx = gg.gradient(U_x, x)
     F = 5.0*U - 5.0*U**3 + 0.0001*U_xx
     Ut = U1 - self.dt*tf.matmul(F, self.IRK_weights.T) # dt was only used here in the PINNs to represent the linear variation in the dimension of t. U1 denotes the solution at t0, and Ut denotes the solution at t1, where t1-t0=dt.
     return Ut 
def net_U1(self, x): # x1
     U1 = self.neural_net(x, self.weights, self.biases)
     U1_x = self.fwd_gradients_1(U1, x)
     return U1, U1_x   

loss is defined by loss between the prediction (U0_pred) and the measurement(u0) of the PDE solution ($u(t,x)$) as well as the loss of boundary condition, represented by U1.

        
      
self.U0_pred = self.net_U0(self.x0_tf) 
self.U1_pred, self.U1_x_pred= self.net_U1(self.x1_tf) 
self.loss = tf.reduce_sum(tf.square(self.u0_tf - self.U0_pred)) + \  
            tf.reduce_sum(tf.square(self.U1_pred[0,:] - self.U1_pred[1,:])) + \
            tf.reduce_sum(tf.square(self.U1_x_pred[0,:] - self.U1_x_pred[1,:])) 

To minimize the loss function, optimizers, Adam and L-BFGS, were utilized in sequence. The appliance of L-BFGS was updated in tf 2.x framework and updated as follows.

        
      
def train(self, nIter):
     tf_dict = {self.x0_tf: self.x0, self.u0_tf: self.u0, self.x1_tf: self.x1}
     start_time = time.time()
     for it in range(nIter):
         self.sess.run(self.train_op_Adam, tf_dict)
         # Print
         if it % 10 == 0:
             elapsed = time.time() - start_time
             loss_value = self.sess.run(self.loss, tf_dict)
             print('It: %d, Loss: %.3e, Time: %.2f' % 
                   (it, loss_value, elapsed))
             start_time = time.time()
     start = np.arange(0, self.x0.shape[0], dtype="float64")
     self.optimizer = tfp.optimizer.lbfgs_minimize(self.loss,
                                                   max_iterations = 50000,
                                                   initial_position = start,
                                                   num_correction_pairs = 50,
                                                   max_line_search_iterations = 50)

input data:
- $x0$ is a list of sampled input x.
- $u0$ is randomly selected through the PDE solution, $u(t,x)$, at specific position, $x0$ and $t0$.
- $x1$ is vertically stacked data [-1,1], the boundary condition.
- $dt$ is the random length of t, $t1-t0$, where $t0$ correponds to the position of $u0$.
- layers is a list, representing the strusture of $f(t,x)$.
With tThe model.train()
In the prediction of the PDE solution at the condition of $x^{}$ and $t1$, the error between the predicted $u(t1,x^{})$ and the measurement is defined as follows to show the accuracy. A smaller error shows a better accuracy.
1 error = np.linalg.norm(U1_pred[:,-1] - Exact[idx_t1,:], 2)/np.linalg.norm(Exact[idx_t1,:], 2)

Potential problems

In training, model.predict() gets two objects from net_U1(), the simulated output of $u(t,x)$ and $\frac{\partial u}{\partial x}$, but the error calculated in step 7 may not deal with the gradient?

Changes in versions of tensorflow

tf.compat.v1 should be added in front of some old v1 functions in v2 version of tensorflow.
L-BFGS-B method in tf.opt.ScipyOptimizerInterface could be replaced by tensorflow_probability.optimizer.lbfgs_minimize()

Explanations in programming

tf.placeholder() generates a map, different from tf. Varaibles() and tf.Tensor(). This type is hashable, while the other two are not.
- placeholder() with newly named input x0_tf as tf_dict.keys() for feed_dict is irreplaceable.
tf.compat.v1.disable_eager_execution() couldn’t be used, as in resources_variable_ops.py script, self.numpy() wouldn’t be implemented.
- If the disable_eager_execution() is used, then tensor.ref() is useless.
- To avoid Error: loss in Adam should be a function in eager execution, as a result, * tf.compat.v1.placeholder() couldn’t be replaced by tensor.ref(). * tf.gradients() is taken place by tf.GradientTape(), which requires targets to be a list or nested tensors. In the case of deep learning, the forward function should be appplied within GradientTape().
Why both Adam and L-BFGS optimizer were used.
- With only Adam, the error calculated by np.linalg.norm$ between U1_pred[:,-1] (512,1) and $x$ (1, 512) is 1.37,
- With both Adam and L-BFGS, the error is smaller
The effect of appending tanh in neural_net function.
Effects of removing dummy_x0(x1)
- They are removed in tf.GradientTape()

Continuous time inference equation: Shrödinger equation

It’s noticed that Burger’s equation is one-dimemsional equation, and the temporal variation is linearly represented. So, the difference in applying PINNs in a more complex equation is shown. The loss function contains initial conditions($MSE_{0}$), boundary conditions($MSE_{b}$), and data of PDE($MSE_{f}$). And it is noticeable that $MSE_{b}$ considers both $h$ and $h_{x}$.

\[\begin{align} &ih_{t}+0.5h_{xx}+\lvert h \rvert^{2} = 0, x \in [-5,5], t \in [0, \pi/2] \\ &h(0,x)=2sech(x) \\ &h(t,-5)=h(t,5) \\ &h_{x}(t,-5)=h_{x}(t,5) \\ \end{align}\]

Difference between discrete time inference and continuous one

The biggest difference between discrete time inference and continuous one lies in the representation of the solution $u(x,t)$, where the last layer of the neural network is not 1 but q in usage, because it is exhibited by Runge-Kutta methods with q stages.

Personal views

The innovation lies in the treatment of loss function
The difference between PINNs and adding physics constraints into loss function¹ ? + PINNs gives a solution of PDE, with physical elements included in the loss function. It’s a NN-based algorithm, different from the latter that is process-based model with auxiliary physical concepts in the loss function. From my perspective, since PINNs could be further developped in the exploration of causality in the process-based models, typically in the environemntal sciences, because PDE is also a part of the complex process-based model. Apart from the differece, PINN is more like the physics-constrained machine learning², which takes place of an emipirical parameter in a complex equation. Therefore, with the explicit desrciption provided by PINNs and the causality explored by environemntal experiments, it is prospective to decribe the whole complex model, with reasonably devised sub NNs to dive into the mechanisms in real life.

Further development

As mentioned in Personal Views, PINNs is potential to be not only a solution but to be updated with more causality in its architecture and applied in describing a complex system with better interpretability.

Chen H, Huang J J, Dash S S, et al. A hybrid deep learning framework with physical process description for simulation of evapotranspiration[J]. Journal of Hydrology, 2022, 606: 127422. ↩
Zhao, W. L., Gentine, P., Reichstein, M., Zhang, Y., Zhou, S., Wen, Y., et al. (2019). Physics-constrained machine learning of evapotranspiration. Geophysical Research Letters, 46, 14496–14507. https://doi.org/10.1029/2019GL085291. ↩

Computational sciences, scientific machine learning

pinn

This post is licensed under CC BY 4.0 by the author.

Data-driven solution

Continuous time inference equation: Burger’s equation

Continuous time inference equation: Shrödinger equation

Difference between discrete time inference and continuous one

Trending Tags