AI That Masters Predictions Beyond Existing Data ―Transforming Data-Driven Materials Science―

ISM2024-07
3 March,2025

Developed the machine learning algorithm E2T and its software for learning to learn for extrapolative prediction.
Achieved outstanding extrapolative prediction performance in material property prediction tasks across diverse material systems.
Demonstrated that models exposed to extensive extrapolative tasks can acquire the ability to rapidly adapt to new tasks.

Overview
Kohei Noda, a researcher at JSR Corporation, and Professor Ryo Yoshida at the Institute of Statistical Mathematics, along with their research group, have developed an innovative machine learning technology that enables predictions beyond the distribution of training data and demonstrated its effectiveness in materials research.

The ultimate goal of materials science is to discover new materials in unexplored domains where no data exists. However, predictions made by machine learning are generally interpolative, with their applicability typically limited to regions close to the distribution of existing data. Additionally, in materials research, the high cost of data acquisition makes it difficult to obtain sufficient training data, necessitating exploration beyond the range of available data.

To address this challenge, the research group developed a machine learning algorithm called E2T (extrapolative episodic training). In E2T, a model known as a meta-learner is trained using a large number of artificially generated extrapolative tasks derived from the available dataset. As a result, the model autonomously learns a learning method to perform extrapolative predictions.

In this study, E2T was applied to material property prediction tasks, demonstrating high predictive accuracy even for materials with elemental and structural features not present in the training data. Furthermore, it was revealed that models trained on a large number of extrapolative tasks could rapidly acquire predictive capabilities in unknown domains with only a small amount of additional data.

These research findings were published in Communications Materials on February, 22 2025.

Research Outcomes
In recent years, the application of machine learning has led to remarkable progress on the discovery and development of new materials. At the core of this progress lies property prediction technology driven by machine learning. By leveraging predictive models, we can explore millions or even billions of candidate materials to identify those with desired properties from vast search spaces.

However, many studies face the challenge of limited data availability, which restricts the range of applications for machine learning. Furthermore, the ultimate goal of materials science is to uncover unknown materials with groundbreaking properties. Despite this, machine learning's predictive capabilities are generally confined to regions near the training data, making it difficult to explore uncharted territories. For instance, even generative AI, such as large language models that have revolutionized AI in recent years, are inherently interpolative—they replicate tasks that humans have encountered before. Developing AI technologies capable of predicting beyond existing data represents a grand challenge not only for materials science but also for advancing next-generation AI.

In the field of machine learning, various methodologies have been explored to achieve extrapolative predictions, including:

Domain Generalization: Techniques that aim to learn shared feature representations across diverse tasks.

Data Augmentation: Methods to enhance model performance by increasing the diversity of training data.

Integration of Physical Knowledge with Machine Learning: Approaches that embed prior knowledge, such as physical laws, into machine learning frameworks (e.g., physics-informed neural networks).

Meta-Learning: Techniques that train models to acquire generalized learning strategies by exposing them to a diverse range of tasks.

This study introduces a novel meta-learning approach that enables models to directly acquire broadly applicable learning methods for extrapolative predictions.

In this study, a neural network equipped with an attention mechanism was employed to train a model capable of learning the methods required for achieving extrapolative predictions (Figure 1 ). Specifically, a training dataset and an input-output pair ( , ), extrapolatively related to , were sampled from a given dataset. Here, represents a material, and represents its properties. These three components together form an "episode", which can be generated arbitrarily. Using a large number of artificially generated episodes, a meta-learner = ( , ) was trained to predict from . The trained model learns what function is required to predict ( , ) in an extrapolative relationship with any training dataset. The research group named this novel learning algorithm E2T (extrapolative episodic training).

The research group applied E2T to over 40 property prediction tasks involving polymeric and inorganic materials to evaluate its performance (Figure 2 ). The results showed that, in almost all cases, models trained with E2T outperformed conventional machine learning models in terms of extrapolative accuracy. Additionally, in predictive performance near the training data, E2T demonstrated accuracy equivalent to or greater than that of traditional machine learning.

However, the extrapolative performance of E2T did not reach that of an ideal model (called oracle) trained on the entire dataset including the extrapolative region. In other words, while E2T consistently improved prediction accuracy in extrapolative regions, it fell short of achieving "ultimate extrapolative capability."

A particularly noteworthy finding was that models trained on a large number of extrapolative tasks demonstrated the ability to quickly adapt to new extrapolative tasks through fine-tuning with a limited amount of data. Remarkably, these models achieved comparable performance to an oracle model trained on extrapolative regions, despite requiring significantly less data. In human, rapid adaptability in humans is believed to result not only from innate traits but also from extensive training and experience. This study revealed that a similar phenomenon may occur in the learning processes of AI, where adaptability is enhanced through systematic exposure to diverse tasks.

Future Outlook
The ultimate goal of materials research lies in exploring uncharted material spaces where no data currently exists. For instance, researchers aim to investigate the properties of materials formed by combinations of elements or raw materials that have never been tested before or when sample fabrication protocols are significantly altered. This study began with a fundamental question: Can models trained to achieve extrapolation with existing datasets acquire extrapolative capabilities and adaptability to unknown environments? The researchers presented a remarkably simple solution to this question. While the current evidence is limited to specific cases, if the learning capability of E2T proves to be universal, its impact could extend beyond materials science, influencing a wide range of fields within AI for Science.

One particularly exciting prospect is the application of E2T to the development of foundation models. Foundation models are trained on large-scale, versatile datasets and are expected to exhibit the ability to adapt to a wide variety of downstream tasks. By fine-tuning these models for specific downstream tasks, it is possible to reduce the amount of data required while achieving high predictive accuracy. The extrapolative performance and domain adaptability of E2T have the potential to drive groundbreaking innovations in the development of foundation models, significantly advancing the broader scientific landscape.

Publication
Title:      Advancing extrapolative predictions of material properties through learning to learn using extrapolative episodic training
uthors:   Kohei Noda, Araki Wakiuchi, Yoshihiro Hayashi, Ryo Yoshida
Journal:    Communications Materials
DOI:      10.1038/s43246-025-00754-x

Source code for E2T
https://github.com/JSR-ISM-Smart-Chemistry-Lab/E2T

Figure 1: Learning the learning method for extrapolative prediction using the E2T algorithm.

Figure 2: Band gap prediction of organic-inorganic hybrid perovskites using E2T.

　Contact

　[Research content]
　Ryo Yoshida, Professor (Director)　
　E-mail: yoshidar@ism.ac.jp
　Research Center for Materials Informatics, The Institute of Statistical Mathematics,
　Research Organization of Information and Systems

　[News, public relations]
　URA Station, Planning Unit, Administration Planning and Coordination Section
　The Institute of Statistical Mathematics, Research Organization of Information and Systems
　TEL: +81-50-5533-8580
　E-mail: ask-ura@ism.ac.jp
　　　

press release

index