Unstructured data inside organizations appears to be full of energy, but it is weighed down by an inertia which precludes it from being useful.
2025, Lego's CEO told AFP that US President Donald Trump's tariff threats do not keep him up at night, as the world's largest toymaker on Tuesday posted record earnings for 2024. Sales rose 13 percent to 74.
3 billion kroner last year, while net profit grew five percent to 13.8 billion kroner. Information, without order, is chaotic. Attempting to work with data without structure and form is rather like watching white noise fuzz on an un-cabled television set, where shapes are almost familiar, but devoid of any recognizable manifestation. Unstructured data inside organizations appears to be full of energy, but it is weighed down by an inertia which precludes it from being useful, primarily because it doesn’t know which home it belongs to.To define the term, let’s first say that structured data includes spreadsheets with their formalized rows and columns, “form-based” data resources where we know the fields in a document and so we know what values to expect… and of course relational databases, the purest form of an ordered and structured data repository. Unstructured data, therefore, includes non-tabular data spanning records of phone calls and voicemails, it is raw video that has yet to get meta-tagged to explain its contents, it is blogs and web pages, it’s emails and also social media posts in all their forms. Some data that may appear structured is still essentially unstructured i.e. 6,000 temperature readings and gyroscope movement records aren’t necessarily structured just because they are numbered by sequence; they need to be extracted, parsed, deduplicated and manipulated to become structured for productive use. In so many cases, unstructured data is regarded as an untapped source of real business context, but it is often the hardest to bring in line, the hardest to govern and the toughest to operationalize.refers to the unclassified morass of information as the “unseen data conundrum” and estimates that unsiloed reserves of unstructured data now make up “the majority of enterprise information” today. IDC also suggests that it is more than doubling each year. These data blind spots are thought to create operational risk and to potentially undermine the value of AI. This is important now because organizations are using unstructured data to power large language models and retrieval-augmented generation applications.There’s a whole marketplace structure of unstructured technology toolset vendors today. Amazon Web Services offers an entire menu of functions in this space. Amazon Comprehend is a natural language processing and machine learning service capable of extracting metadata, extracting key phrases and determining sentiment from text in multiple languages. AWS positions this service alongside the Amazon Transcribe speech-to-text tools, the quirkily named Amazon Rekognition image and video analysis service… and there’s also Amazon Textract, which extracts metadata from scanned documents and images.Bitcoin Suddenly Braced For A Crypto Price Game-Changer— Predicted To ‘Unleash’ Trillions Given the breadth of AWS services in this market, it would be reasonable to expect similar-but-skewed proprietary versions of these functions in the major cloud service provider hyperscalers. Microsoft Azure Cosmos DB is a globally distributed, multi-model database with enough intelligence to be able to manage structured, semi-structured and unstructured data. This cloud-native database might be used alongside the playfully named Microsoft Blob Storage service, an object storage service designed for storing large amounts of unstructured data that might exist in images, videos, documents and other binary data. Also from Microsoft, AI Document Intelligence uses machine learning to extract text, key-value pairs, tables and structures from documents automatically. Not to be left out, Google Cloud Platform also works at this level. The cloud and search giant points to its BigQuery brand and the object tables function within it. “Object tables provides a structured record interface for unstructured data stored in Google Cloud Storage. This enables to directly run analytics and machine learning on images, audio, documents and other file types using existing frameworks like SQL and remote functions natively in BigQuery itself,” noted the Google Cloud’s Gaurav Saxena and Thibaud Hottelier, at the time of this product’s launch a couple of years back.Given the services that exist as fairly prominent functions in the major cloud providers and from the toolsets that exist from more specialized players, working with unstructured data is clearly now a more pressing need. Often referred to as enterprise content management, ECM is certainly growing in the combined shadow of big data analytics and and rise of artificial intelligence. The natural evolution for a data market like this is the arrival of industry-specific services aligned to industry verticals. Known for its work in unstructured data management across the healthcare industry, Hyland treads a careful line with its messaging as the company clearly wants to be seen as applicable to all use cases. The company says Hyland Content Intelligence turn unstructured data into actionable, AI-ready content with the 2025 arrival of its Knowledge Enrichment service being among its star players. Related technologies are also present at IBM in the form of Watson Discovery for unstructured search and AI; Elastic for indexing and querying of unstructured text and logs; Cloudera for Hadoop-based data lake services across unstructured and semi-structured data; Databricks, Collibra, Alation, Palantir and Varonis, to name but a mouthful, there is a lot of structure being applied to the unstructured data space.“Unstructured data remains a black box for most organizations, as it becomes critical for AI and business operations,” said Jay Limburn, chief product officer at Ataccama. “Without a way to structure, govern and trust that information, enterprises risk missing the full value of their data.” Limburn points to his firm’s Ataccama One platform as a means to combine data quality, governance, observability, lineage and master data management. Ataccama One is now available on Snowflake Marketplace as a new integration with Document AI, a Snowflake AI feature that uses Arctic-TILT, a proprietary large language model used to extract data from documents. This fusion of data structuring services is billed as a means of turning unstructured content, such as contracts, invoices and PDFs, into structured data by running models directly within Snowflake. Businesspeople can use natural language prompts, such asWhere does the unstructured marketplace go next? If we accept the proposition that AI services are partly responsible for the surge in this sector , then we might actually see AI services themselves starting to shoulder the responsibility for structuring our unstructuredness. Given the current debate over whether chat-based AI services will take over browser search - and the fact that OpenAI offers GPT-based APIs for text extraction, summarization, semantic intent analysis and classification - that might be exactly what happens.
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
DOGE targets Census Bureau, worrying data users about health of US data infrastructureThe group run by Elon Musk and aides to cut federal spending in the second Trump administration is targeting surveys conducted by the U_S_ Census Bureau that it claims are “wasteful.'
Read more »
DOGE targets Census Bureau, worrying data users about health of US data infrastructureThe group run by Elon Musk and aides to cut federal spending in the second Trump administration is targeting surveys conducted by the U.S. Census Bureau that it claims are “wasteful.' That is worrying users of federal data who already were concerned about the health of the nation’s statistical infrastructure.
Read more »
DOGE targets Census Bureau, worrying data users about health of US data infrastructureThe group run by Elon Musk and aides to cut federal spending in the second Trump administration is targeting surveys conducted by the U_S_ Census Bureau that it claims are “wasteful.'
Read more »
DOGE targets Census Bureau, worrying data users about health of US data infrastructureThe group run by Elon Musk and aides to cut federal spending in the second Trump administration is targeting surveys conducted by the U_S_ Census Bureau that it claims are “wasteful.'
Read more »
BoE’s Dhingra: Supply chain data points more clearly to disinflation than noisy wage dataBank of England (BoE) policymaker Swati Dhingra is testifying on the May Monetary Policy Report (MPR) before the UK Parliament's Treasury Select Committee (TSC) on Tuesday.
Read more »
Major data broker hack impacts 364,000 individuals’ dataPersonal information from 364,000 people was compromised in a LexisNexis data breach that went undetected for months, highlighting privacy risks.
Read more »
