Application of LLM to Search and Systematize the Properties of Thermoelectric Materials in Scientific Literature
DOI:
https://doi.org/10.63527/1607-8829-2025-1-16-25Keywords:
thermoelectricity, materials science, machine learning, large language models, thermoelectric energy converters, computer simulationAbstract
Thermoelectric materials find applications in a variety of fields due to their ability to directly convert heat into electricity. Selecting the optimal thermoelectric material is a challenging task, limited by empirical, time, and economic factors. Recent advances in artificial intelligence (AI), in particular large language models (LLMs), demonstrate significant potential for automatically extracting and organizing information from the scientific literature on the properties of thermoelectric materials. This review analyzes the evolution of machine learning-based methods, from early unsupervised NLP models such as Word2Vec to modern approaches using GPT models. The research results show that LLMs allow for the efficient identification of new promising thermoelectric materials, automation of experimental data collection processes, and the formation of structured databases, which significantly accelerates the search for materials with high efficiency rates. The paper also outlines directions for further research, such as extending the methods to tabular and graphical data, as well as optimizing computational resources.
References
1. Anatychuk L. I., Prybyla A. V. (2017). Limiting possibilities of thermoelectric liquid-liquid heat pumps. J.Thermoelectricity, 4, 51-55.
2. Rifert V., Anatychuk L., Barabash P., Solomakha A., Usenko V., Prybyla A., Sereda V. (2019). Comparative analysis of thermal distillation methods with heat pumps for long space flights. J.Thermoelectricity, 4, 5–17. Retrieved from http://jte.ite.cv.ua/index.php/jt/article/view/70
3. Anatychuk L., Lysko V., Prybyla A. (2022). Rational areas of using thermoelectric heat recuperators. J.Thermoelectricity, 3-4, 43–67. https://doi.org/10.63527/1607-8829-2022-3-4-43-67
4. Anatychuk L., Prybyla A., Korop M., Kiziuk Y., & Konstantynovych I. (2024). Thermoelectric power sources using low-grade heat: Part 1. J. Thermoelectricity, 1-2, 90–96. https://doi.org/10.63527/1607-8829-2024-1-2-90-96
5. Anatychuk L. (2020). Efficiency criterion of thermoelectric energy converters using waste heat. J.Thermoelectricity, 4, 58–63. Retrieved from http://jte.ite.cv.ua/index.php/jt/article/view/47
6. Anatychuk L.I., Lysko V.V., Havryliuk M.V. (2018). Ways for quality improvement in the measurement of thermoelectric material properties by the absolute method. J.Thermoelectricity, 2, 90 – 100.
7. Anatychuk L.I., Lysko V.V., Havryliuk M.V., Tiumentsev V.A. (2018). Automation and computerization of measurements of thermoelectric parameters of materials. J. Thermoelectricity, 3, 80 – 88.
8. Anatychuk L.I., Lysko V.V. (2012). Investigation of the effect of radiation on the precision of thermal conductivity measurement by the absolute method. J.Thermoelectricity, 1, 65–73.
9. Anatychuk L.I., Lysko V.V. Modified Harman's method. (2012) AIP Conference Proceedings, 1449, 373 – 376. DOI: 10.1063/1.4731574.
10. Korop M. M. (2023). Machine learning in thermoelectric materials science. In: J. Thermoelectricity, 1, 44–54. Institute of Thermoelectricity. https://doi.org/10.63527/1607-8829-2023-1-44-54
11. Anatychuk L. I., Korop M. M. (2023). Application of machine learning to predict the properties of Bi2Te3 -based thermoelectric materials. In: J.Thermoelectricity, 2, 59–71. Institute of Thermoelectricity. https://doi.org/10.63527/1607-8829-2023-2-59-71
12. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin, I. (2017). Attention Is All You Need (Version 7). arXiv. https://doi.org/10.48550/ARXIV.1706.03762
13. Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K. A., Ceder G., & Jain A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. In Nature, 571(7763), 95–98 . Springer Science and Business Media LLC. https://doi.org/10.1038/s41586-019-1335-8
14. Sierepeklis O., Cole J. M. (2022). A thermoelectric materials database auto-generated from the scientific literature using ChemDataExtractor. In Scientific Data (Vol. 9, Issue 1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41597-022-01752-1
15. Jia X., Yao H., Yang Z., Shi J., Yu J., Shi R., Zhang H., Cao F., Lin X., Mao J., Wang C., Zhang Q., & Liu X. (2023). Advancing thermoelectric materials discovery through semi-supervised learning and high-throughput calculations. In Applied Physics Letters, 23, 20. AIP Publishing. https://doi.org/10.1063/5.0175233
16. Thway M., Low A. K. Y., Khetan S., Dai H., Recatala-Gomez J., Chen A. P., Hippalgaonkar K. (2024). Harnessing GPT-3.5 for text parsing in solid-state synthesis – case study of ternary chalcogenides. In Digital Discovery,. 3(2), 328–336). Royal Society of Chemistry (RSC). https://doi.org/10.1039/d3dd00202k
17. Polak M. P., Morgan D. (2024). Extracting accurate materials data from research papers with conversational language models and prompt engineering. In Nature Communications.. 15(1), Springer Science and Business Media LLC. https://doi.org/10.1038/s41467-024-45914-8
18. Dagdelen J., Dunn A., Lee S., Walker N., Rosen A. S., Ceder G., Persson K. A., Jain A. (2024). Structured information extraction from scientific text with large language models. In Nature Communications, 15(1). Springer Science and Business Media LLC. https://doi.org/10.1038/s41467-024-45563-x
19. Itani S., Zhang Y., & Zang J. (2025). Large Language Model-Driven Database for Thermoelectric Materials (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2501.00564