Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 5-33 (2021)

Materials Informatics: A Review and Perspectives

Ryo Yoshida
(The Institute of Statistical Mathematics)

In this paper, we present an overview of materials informatics, focusing on machine learning technologies to several inverse problems in materials research. The objective of the forward problem is to predict the output of a system with respect to its input. For example, the input variable corresponds to the structure of a given material and the output variable corresponds to its properties. In the inverse problem, we identify promising candidate materials that exhibit any given desired properties by solving the inverse mapping of the forward model. This is a conventional workflow of data science, but one distinct feature of data analysis in materials research lies in the high dimensionality and specificity of the variables. In general, the search space for candidate materials is extremely vast. In addition, in many cases, we deal with variables that are non-trivial to be represented into fixed-length vectors, such as composition, molecules, and crystal structures. In this paper, we describe the essence of machine learning for solving inverse problems by introducing various examples.

Key words: Physical property, materials design, synthesis, inverse problem, descriptor, generative models.


Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 35-47 (2021)

Machine Learning for Automated Molecular Design with Application to the Discovery of New Polymers with High Thermal Conductivity

Ryo Yoshida
(The Institute of Statistical Mathematics)
Stephen Wu
(The Institute of Statistical Mathematics)
Junko Morikawa
(School of Materials and Chemical Technology, Tokyo Institute of Technology)

We aim to design chemical structures with desired properties by applying analytical techniques of Bayesian inference and machine learning. Based on data obtained from experiments or simulations, we derive a model that forwardly predict physical, chemical, electronic, thermodynamic, mechanical properties of any give chemical structure. The Bayes rule of conditional probability is applied to this forward model to derive the backward prediction model from property to structure. By generating hypothetical molecules from this model, we identify promising candidates that exhibit the desired properties. We have successfully applied this approach to discover new plastic polymers with thermal conductivity reaching 0.41W/mK. This corresponds to a performance improvement of about 80% compared to a conventional unoriented polyamide polymer. In this paper, we describe the technology of the Bayesian molecular design algorithm, and then illustrate its application to the study of polymer thermophysical properties.

Key words: Molecular design, Bayesian inference, transfer learning, polymer, thermal conductivity.


Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 49-63 (2021)

Application of Transfer Learning in Materials Research

Chang Liu
(The Institute of Statistical Mathematics)
Hironao Yamada
(The Institute of Statistical Mathematics/School of Pharmacy, Tokyo University of Pharmacy and Life Sciences)
Stephen Wu
(The Institute of Statistical Mathematics/Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, SOKENDAI)

The digital transformation of materials research has resulted in a broad array of materials property databases; however, the available databases do not include advances realized in machine learning. Transfer learning is a machine learning framework with potential to break the barrier and identify various properties that are physically interrelated. For a given target property to be predicted from a limited supply of training data, models on related proxy properties are pre-trained using enough data to capture the common features relevant to the target task. Repurposing such machine-acquired features for a target task results in an outstanding predictive power even with exceedingly small data. We demonstrate transfer learning in various real-world applications, including property prediction of polymers and inorganic materials. In particular, we show several examples in which transfer learning is applied to obtain a predictive capability in a domain that greatly deviates from the training data distribution.

Key words: Transfer learning, novel material design, crystalline, molecular, polymer.


Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 65-82 (2021)

Challenges in Polymer Informatics

Stephen Wu
(The Institute of Statistical Mathematics/Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, SOKENDAI)
Hironao Yamada
(The Institute of Statistical Mathematics/School of Pharmacy, Tokyo University of Pharmacy and Life Sciences)
Yoshihiro Hayashi
(The Institute of Statistical Mathematics)
Massimiliano Zamengo
(School of Materials and Chemical Technology, Tokyo Institute of Technology)

Polymers can exhibit a wide range of functional properties based on different design of monomer and controlling of their manufacturing processes. Their broad applications range from the plastic bags and bottles used in daily life to a variety of electronics, and even structural components in the aerospace industry. Polymer informatics is an interdisciplinary research field of polymer science, computer science, information science and machine learning that serves as a platform to exploit existing polymer data for efficient design of functional polymers. Despite the increasing examples of data-driven approach to polymer design, there has been notable challenges of the development of polymer informatics attributed to the complex hierarchical structures of polymers, such as the lack of open databases and unified structural representation. In this paper, we review and discuss the applications of machine learning on different aspects of the polymer design process through four perspectives: polymer databases, representation (descriptor) of polymers, predictive models for polymer properties, and polymer design strategy.

Key words: Polymer informatics, machine learning, high-throughput screening, inverse design, experimental design.


Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 83-97 (2021)

Machine Learning in Reaction Prediction and Synthetic Route Design

Zhongliang Guo
(The Institute of Statistical Mathematics)

In organic chemistry, predicting the products from the reactants is called reaction prediction, while the design of synthetic routes in the opposite direction from the final products, which are the target molecule, is called synthetic route design. Reaction prediction and synthetic route design have been studied for more than 50 years. In recent years, advances in machine learning have significantly improved the accuracy of predicting chemical reactions. In this paper, we review the applications of machine learning in chemical reactions published since 2017. In particular, because the prediction of chemical reactions in the forward and reverse directions is different in mathematical formulation, we consider the differences in the application of machine learning methods. We also introduce a method for designing synthetic routes based on Bayesian inference presented by our group.

Key words: Reaction prediction, synthetic route design, machine learning, Bayesian inference.


Proceedings of the Institute of Statistical Mathematics Vol.69, No.1, 99-118 (2021)

A Model for Social Network Analysis Considering Text Information on Social Media
—Extended Model Considering Node Degree Heterogeneity—

Mirai Igarashi
(Graduate School of Economics and Management, Tohoku University)
Nobuhiko Terui
(Graduate School of Economics and Management, Tohoku University)

In the study of social networks, it has become increasingly important to consider not only network information but also text information generated by people on social media in order to deeply understand the community structure. By taking text information into account, it is possible to analyze social networks with complex community structures, which have multiple clusters of people depending on their interests in a single network with densely connected edges. In this study, we extend the existing model that simultaneously consider network and text information, and propose a model with degree heterogeneity to represent that the probability of edge generation varies for each node. In empirical analysis using Twitter dataset, we compare it with comparative models not using text information or degree heterogeneity, and show it's better predictive performance.

Key words: Social network analysis, community detection, text analysis, topic modeling, Bayes inference, node heterogeneity.