Solving Time-Dependent 2D Heat Equation using Machine Learning at USC
Project members:
Bryan Shaddy
Javier Murgoitio-Esandi
Zhou Xu
The goal of this project is to predict the temperature field in a conductive material after a certain time by using a fully convolutional network (U-Net architecture) which will take an initial temperature field as input and return the predicted change in temperature during one time step. By recursively supplying the U-Net with its own prediction, it will also be able to predict the
temperature field in a material.
The algorithm consists of two major components. Firstly we use FEniCS, a Finite Element (FE) solver, to create data consisting of the time-dependent solution of the heat conduction equation for many initial conditions, solved on a mesh of 64x64 points which is insulated at the boundaries. The training data will be organized into pairs, where each pair consists of the temperature distribution at one time and the change of temperature after one timestep (0.001 s). In total we have 600 solutions for the heat conduction equation,
with 18 time steps for each solution, giving a total of 10,800 pairs for training. We also have 3600 validation pairs and other 3800 testing pairs. Although the testing pairs are composed of 100 solutions to the problem with 38 time-steps for each solution. This is done to see if the model is able to predict solutions that are beyond the time for which it has been trained.
Secondly, we will construct an algorithm which creates and trains a fully convolutional network with a U-Net architecture to take in the current state as input, or temperature distribution of a conductive region, and return change of the temperature field after one timestep. We will use mini-batch optimization to train the network, along with the Adam optimizer.
key skills:
Python
U-Net Architecture
TensorFlow
Architecture:
In this study, we are approximating the map from a 2D scalar field (initial temperature) to another 2D scalar field (temperature after a timestep). Therefore, we are using image-to-image architectures. In particular, U-Net (encoder-decoder) architectures based on the architecture presented by Ronnenberg et al. (2015)XXXXX, which is composed of an encoder
(downsampling) part and a decoder (upsampling) part. We have studied two different architectures, one consisting of two downsampling steps (named Architecture 1, left) and another consisting of three (named Architecture 2, right). A schematic representation of these architectures is shown in Figure.
Loss function:
To build the loss function, we considered two different error norms commonly used in image-to-image applications which are the mean absolute (mean L1-norm) and the mean squared errors (mean L2-norm). The loss function for every training batch is given by:
Where Nbatch is the number of training samples, n-param is the number of network parameters, L is the error norm, Y is the true samples, is the prediction, is the regularization 𝑌 λ parameter and θ is the parameters vector.
Validation and testing:
The figure shows an overall picture of the training process by showing the training and validations losses for various architectures and hyperparameters, including the regularization parameter (α) and the learning rate (η). Figure a shows the training loss for various learning rates using Architecture 1 and α = 10^-10. We observed that very small learning rates do not perform better than larger ones. In fact, we found that by continuously reducing the learning rate throughout the training, we can achieve lower training losses. More schemes such as this were explored and the loss curves are shown in Figure b and Figure c were carried out starting with α = 10^-3 and reducing it by 10% every 15 epochs. Figure b and Figure c show the training and validation losses, respectively, using Architectures 1 and 2 and two regularization parameters (α = 10^-8 and α = 10^-10). We observe that although the more complex architecture (Architecture 2) has a slightly smaller training loss, its validation loss is considerably higher. For this reason, we have used Architecture 1 to analyze the predictions of the model in the Results section. It is worth pointing out that it seems worthwhile further exploring adaptive learning methods to understand how we could use the learning rate to optimally train our model. Regarding the regularization parameter, a larger range of regularization parameters has been studied and we concluded that this Architecture can be trained successfully using a very low regularization parameter (α = 10^-10 ).
Results and Discussion:
The purpose of this study was to use a U-Net architecture (encoder-decoder architecture) to solve the time-dependent heat equation. In the previous section, we have shown how we were able to train the U-Net model to achieve high accuracy in its predictions. In this section, we use the model to solve the forward time problems. Also, we show the accuracy of the model in preserving the mean temperature in the region throughout the time interval modeled. The figure shows the predictions of the U-Net model compared to the FEM solutions (target solutions) at different time steps for one testing sample and the forward time problem. The predictions from the U-Net have been obtained by supplying the initial condition (t=0s) to the model and recursively calculating the following time steps (using the previous prediction as input to the model) up to the final time available in the FEM solutions (t=0.038s). We observe that the predictions of the U-Net model predictions are very accurate even for the last time step. In fact, we observe that the model is able to predict the temperature in the “hotter” region with higher accuracy for the last time-step than for the first time-step. This is probably due to the nonlinearity of the temperature distribution in the initial condition. Also, we observe that the largest error for the last time step is located on the boundary, where the expressivity of the U-Net model is limited due to the use of padding to maintain the image size.
Figure: Predictions of the model at t=0.001, 0.016, and 0.038s obtained by initially supplying to the model the initial condition (t=0s) and then recursively supplying its own prediction (the output added to the
input). a), d) and g) show the FEM predictions; b), e) and h) show the CNN model’s predictions; and c), f) and i) show the difference between the CNN’s model predictions and the FEM predictions.