This study reveals a significant degree of redundancy in materials data for machine learning, with up to 95% of data being safely removable without impacting prediction performance. Over-represented material types contribute to this redundancy, which does not address performance degradation on out-of-distribution samples. Active learning algorithms can construct smaller but equally informative datasets. The focus should be on information richness rather than data volume.
Extensive efforts to gather materials data have largely overlooked potential data redundancy. In this study, we present evidence of a significant degree of redundancy across multiple large datasets for various material properties, by revealing that up to 95% of data can be safely removed from machine learning training with little impact on in-distribution prediction performance .
The redundant data is related to over-represented material types and does not mitigate the severe performance degradation on out-of-distribution samples. In addition, we show that uncertainty-based active learning algorithms can construct much smaller but equally informative datasets. We discuss the effectiveness of informative data in improving prediction performance and robustness and provide insights into efficient data acquisition and machine learning training. This work challenges the “bigger is better” mentality and calls for attention to the information richness of materials data rather than a narrow emphasis on data volume
Materials Data Redundancy Machine Learning Prediction Performance Active Learning Information Richness
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Southern California's first significant storm of the season expected to hit WednesdayThe season's first significant storm is expected midweek in Southern California, bringing 1 to 2 inches of rain over several days.
Read more »
10 Significant TV Characters Who Never Actually Appeared In Their ShowsTV shows often introduce characters who remain unseen, from Charlie in Charlie’s Angels to Vera Peterson in Cheers, an overused yet funny trope.
Read more »
Colorado hiker missing since August found dead, his dog found alive next to his bodyA hunter found Rich Moore's body and his white Jack Russell terrier was next to the body, the sheriff's office said.
Read more »
Mapping the Issues: How Ohio’s ballot measures stack up against recent elections, one anotherLast week’s election marks a significant victory for reproductive rights and recreational marijuana advocates.
Read more »
USD/JPY experiences significant drop before stabilizingUSD/JPY experienced a significant drop, losing 67 pips rapidly, before stabilizing around 151.52, slightly up by 0.03%. Japanese Producer Price Index data showed a contraction, justifying the Bank of Japan’s ultra-loose policy. Traders are now focusing on the US CPI data release on Tuesday, which is expected to show a slight moderation in inflation rates.
Read more »
Risk of Volcanic Eruption Remains Significant in IcelandSeismic activity in southwestern Iceland decreased in size and intensity on Monday, but the risk of a volcanic eruption remained significant, authorities said, after earthquakes and evidence of magma spreading underground in recent weeks.
Read more »




