PINNs_Inference
Trained to solve supervised learning
tasks, Physic-informed neural networks (PINNs) are defined for two classes of problems:
data-driven solution and data-driven discovery of partial differential equations (PDEs).
Data-driven solution
Given a general form of PDEs $f(t,x)$, PINN could get to the solution $u(t,x)$ of this 1-dimensional non-linear equation with only small amount of training data, namely initial and boundary conditions of $u(t,x)$ as well as the sampled data of $f(t,x)$.
Continuous time inference equation: Burger’s equation
\[f(t,x)=u_{t}+uu_{xx}-(0.01/\pi)u_{xx}\]By minimizing the mean square error loss $MSE_{u}+MSE_{f}$, including the data measurements of $u(t,x)$ at initial and boundary conditions, and the supervised loss of the Burger’s equation, $u(t,x)$ is accurately simulated by neural networks.
\[MSE_{u}=\frac{1}{N_{u}}\sum_{i=0}^{N_{u}}\lvert u(t_{u}^{i},x_{u}^{i})-u^{i} \rvert^2\] \[MSE_{f}=\frac{1}{N_{f}}\sum_{i=0}^{N_{f}}\lvert f(t_{f}^{i},x_{f}^{i}) \rvert^2\]Steps are as follows.
- Weights between layers are defined like
xavier_init
1 2 3 4 5
def xavier_init(self, size): in_dim = size[0] out_dim = size[1] xavier_stddev = np.sqrt(2/(in_dim + out_dim)) return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)
- Given the number of
layers
, weights and biases are produced byinitialize_NN
1 2 3 4 5 6 7 8 9 10
def initialize_NN(self, layers): weights = [] biases = [] num_layers = len(layers) for l in range(0,num_layers-1): W = self.xavier_init(size=[layers[l], layers[l+1]]) b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32) weights.append(W) biases.append(b) return weights, biases
- Given weights and biases, a complete neural network without the final activation is designed by
neural_net
1 2 3 4 5 6 7 8 9 10 11
def neural_net(self, X, weights, biases): num_layers = len(weights) + 1 H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0 for l in range(0,num_layers-2): W = weights[l] b = biases[l] H = tf.tanh(tf.add(tf.matmul(H, W), b)) W = weights[-1] b = biases[-1] Y = tf.tanh(tf.add(tf.matmul(H, W), b)) # YC: tf.tanh was personally added. return Y
where,
H
seems like restricting the value of the first layer within [-1,1]. - As weights,
dummy_x0(x1)_tf
is identical matirx ofx0(x1)
rows, and are thus replaced byx0(x1)
. Besides, to adjust to the tf 2.x framework, tf. gradients() was replaced with tf.GradientTape(), and gradient process was merged into the core part of representing loss functions in PINNs.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
def net_U0(self, x): # YC: using GradientTape instead of gradients with tf.GradientTape() as g: g.watch(x) with tf. GradientTape() as gg: gg.watch(x) U1 = self.neural_net(x, self.weights, self.biases) U = U1[:,:-1] U_x = gg.gradient(U, x) U_xx = gg.gradient(U_x, x) F = 5.0*U - 5.0*U**3 + 0.0001*U_xx Ut = U1 - self.dt*tf.matmul(F, self.IRK_weights.T) # dt was only used here in the PINNs to represent the linear variation in the dimension of t. U1 denotes the solution at t0, and Ut denotes the solution at t1, where t1-t0=dt. return Ut def net_U1(self, x): # x1 U1 = self.neural_net(x, self.weights, self.biases) U1_x = self.fwd_gradients_1(U1, x) return U1, U1_x
loss
is defined by loss between the prediction (U0_pred) and the measurement(u0) of the PDE solution ($u(t,x)$) as well as the loss of boundary condition, represented by U1.1 2 3 4 5
self.U0_pred = self.net_U0(self.x0_tf) self.U1_pred, self.U1_x_pred= self.net_U1(self.x1_tf) self.loss = tf.reduce_sum(tf.square(self.u0_tf - self.U0_pred)) + \ tf.reduce_sum(tf.square(self.U1_pred[0,:] - self.U1_pred[1,:])) + \ tf.reduce_sum(tf.square(self.U1_x_pred[0,:] - self.U1_x_pred[1,:]))
- To minimize the loss function, optimizers,
Adam
andL-BFGS
, were utilized in sequence. The appliance of L-BFGS was updated in tf 2.x framework and updated as follows.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
def train(self, nIter): tf_dict = {self.x0_tf: self.x0, self.u0_tf: self.u0, self.x1_tf: self.x1} start_time = time.time() for it in range(nIter): self.sess.run(self.train_op_Adam, tf_dict) # Print if it % 10 == 0: elapsed = time.time() - start_time loss_value = self.sess.run(self.loss, tf_dict) print('It: %d, Loss: %.3e, Time: %.2f' % (it, loss_value, elapsed)) start_time = time.time() start = np.arange(0, self.x0.shape[0], dtype="float64") self.optimizer = tfp.optimizer.lbfgs_minimize(self.loss, max_iterations = 50000, initial_position = start, num_correction_pairs = 50, max_line_search_iterations = 50)
- input data:
- $x0$ is a list of sampled input x.
- $u0$ is randomly selected through the PDE solution, $u(t,x)$, at specific position, $x0$ and $t0$.
- $x1$ is vertically stacked data [-1,1], the boundary condition.
- $dt$ is the random length of
t
, $t1-t0$, where $t0$ correponds to the position of $u0$. layers
is a list, representing the strusture of $f(t,x)$.
- With tThe model.train()
- In the prediction of the PDE solution at the condition of $x^{}$ and $t1$, the error between the predicted $u(t1,x^{})$ and the measurement is defined as follows to show the accuracy. A smaller error shows a better accuracy.
1
error = np.linalg.norm(U1_pred[:,-1] - Exact[idx_t1,:], 2)/np.linalg.norm(Exact[idx_t1,:], 2)
Potential problems
- In training, model.predict() gets two objects from net_U1(), the simulated output of $u(t,x)$ and $\frac{\partial u}{\partial x}$, but the error calculated in
step 7
may not deal with the gradient?
Changes in versions of tensorflow
- tf.compat.v1 should be added in front of some old v1 functions in v2 version of tensorflow.
L-BFGS-B
method in tf.opt.ScipyOptimizerInterface could be replaced by tensorflow_probability.optimizer.lbfgs_minimize()
Explanations in programming
tf.placeholder()
generates a map, different from tf. Varaibles() and tf.Tensor(). This type is hashable, while the other two are not.- placeholder() with newly named input
x0_tf
as tf_dict.keys() forfeed_dict
is irreplaceable.
- placeholder() with newly named input
tf.compat.v1.disable_eager_execution()
couldn’t be used, as in resources_variable_ops.py script, self.numpy() wouldn’t be implemented.- If the
disable_eager_execution()
is used, thentensor.ref()
is useless. - To avoid Error: loss in Adam should be a function in eager execution, as a result,
*
tf.compat.v1.placeholder()
couldn’t be replaced bytensor.ref()
. *tf.gradients()
is taken place bytf.GradientTape()
, which requires targets to be a list or nested tensors. In the case of deep learning, the forward function should be appplied within GradientTape().
- If the
- Why both Adam and L-BFGS optimizer were used.
- With only Adam, the error calculated by
np.linalg.norm$ between
U1_pred[:,-1] (512,1) and $x$ (1, 512) is 1.37, - With both Adam and L-BFGS, the error is smaller
- With only Adam, the error calculated by
- The effect of appending
tanh
in neural_net function. - Effects of removing dummy_x0(x1)
- They are removed in tf.GradientTape()
Continuous time inference equation: Shrödinger equation
It’s noticed that Burger’s equation is one-dimemsional equation, and the temporal variation is linearly represented. So, the difference in applying PINNs in a more complex equation is shown. The loss function contains initial conditions($MSE_{0}$), boundary conditions($MSE_{b}$), and data of PDE($MSE_{f}$). And it is noticeable that $MSE_{b}$ considers both $h$ and $h_{x}$.
\[\begin{align} &ih_{t}+0.5h_{xx}+\lvert h \rvert^{2} = 0, x \in [-5,5], t \in [0, \pi/2] \\ &h(0,x)=2sech(x) \\ &h(t,-5)=h(t,5) \\ &h_{x}(t,-5)=h_{x}(t,5) \\ \end{align}\]Difference between discrete time inference and continuous one
The biggest difference between discrete time inference and continuous one lies in the representation of the solution $u(x,t)$, where the last layer of the neural network is not 1 but q in usage, because it is exhibited by Runge-Kutta methods with q stages.
Personal views
- The innovation lies in the treatment of loss function
- The difference between
PINNs
andadding physics constraints into loss function
1 ? + PINNs gives a solution of PDE, with physical elements included in the loss function. It’s a NN-based algorithm, different from the latter that is process-based model with auxiliary physical concepts in the loss function. From my perspective, since PINNs could be further developped in the exploration of causality in the process-based models, typically in the environemntal sciences, because PDE is also a part of the complex process-based model. Apart from the differece, PINN is more like the physics-constrained machine learning2, which takes place of an emipirical parameter in a complex equation. Therefore, with the explicit desrciption provided by PINNs and the causality explored by environemntal experiments, it is prospective to decribe the whole complex model, with reasonably devised sub NNs to dive into the mechanisms in real life.
Further development
- As mentioned in
Personal Views
, PINNs is potential to be not only a solution but to be updated with more causality in its architecture and applied in describing a complex system with better interpretability.
-
Chen H, Huang J J, Dash S S, et al. A hybrid deep learning framework with physical process description for simulation of evapotranspiration[J]. Journal of Hydrology, 2022, 606: 127422. ↩
-
Zhao, W. L., Gentine, P., Reichstein, M., Zhang, Y., Zhou, S., Wen, Y., et al. (2019). Physics-constrained machine learning of evapotranspiration. Geophysical Research Letters, 46, 14496–14507. https://doi.org/10.1029/2019GL085291. ↩