Data lakes are a way for organizations to store lots of (mostly) unstructured information, but their centralized mass brings an additional responsibility for control.
I track enterprise software application development & data management .The saltwater crocodile , also known as saltie, estuarine or Indo-Pacific crocodile, is the largest of all living reptiles, as well as the largest terrestrial and riparian predator in the world.
Data lakes are massive, by definition. They work to house the morass of unstructured and semi-structured data that is generally unfiltered, often duplicated, typically unparsed and low-level and increasingly machine-generated by sensors in the Internet of Things, or by AI agents that now start to pour their output into the data lake as well. On balance, data lakes are regarded as a good thing. They allow organizations to make sure they are capturing all the data that they might channel through every operational pipe of their IT stack. Having access to as-yet-untapped data stores when needed is a comfortable position for the chief data scientist in any business. Viewed as a key move for firms to future-proof their data strategy , a data lake also represents a democratization of data i.e. it’s a really deep pool and - as long as you wear a life jacket anyone including business users can potentially take a dip at any time. Data lakes also store structured data such as information streams from customer relationship management systems or enterprise resource planning systems, but they are less frequently discussed in that role. In our current climate of AI-everything, organizations are demanding end-to-end visibility of their businesses and the activities carried out by their customers. Data lakes help make that possible and they also ensure a business can centralize around one repository so that data silos don’t start to grow… and that’s a good thing too.As in practically all aspects of technology, there’s a yin and yang factor to consider. If we think back to pre-millennial times, when an organization had 42 databases , users needed to know 42 database attributes and a corresponding number of security measures and procedures to access data. However, in a single data lake, it is theoretically possible for a person with access to the right credentials to access everything via one entry point. The fabled “single pane of glass” strategy that so many companies are chasing when it comes to data, apps and business actions becomes the same single pane an intruder needs to break to enter., head of product for AI and SaaS at DevOps platform company Perforce. Speaking at a data analytics roundtable this week, the product engineering development man highlighted more danger in the water. “It’s always important to remember that there’s Sam - and most organizations have a Sam. They’ve been with the company for decades and, during their tenure, they built a database into which no one else has insight. Maybe Sam has now left the organization, so Sam’s database is effectively a black box. Now put Sam’s database in the single data lake and the implications could be huge,” suggested Karam. “But what if Sam’s data store includes duplicated personally identifiable information and the columns with that PII are no longer tracked? This would be an ideal feeding ground for the crocodiles dwelling beneath the lake’s surface. An already broken process just expanded.” Karam invites us to add AI into the mix. Compared to analysts who are expert data wranglers and write targeted queries to get what they need, he says that AI has an “omnivorous, insatiable appetite” these days and that means it wants to eat all the data. He views it as something of a “blabbermouth” that spills more secrets than a chatty family relative during a holiday dinner after too much wine. The risk landscape subsequently explodes.“So we have a quandary: teams across enterprises depend on fast access to data to build and test software, get to market faster and optimize strategy… yet data lakes are essentially useful things,” said Karam. “For an illustrative example, consider the fact that detailed data is increasingly essential to meet demand for customer experience customisation. Yet the risks are very real, our own market study suggests that around half of organizations have reported that they had already experienced a data breach or theft involving sensitive data in non-production environments.” So what’s the answer? Cataloguing and dividing data into different categories is a good starting point, Karam says that Microsoft’s Medallion architecture is a good example. Microsoft actually talks about this technology as the Medallion data lakehouse architecture and it is essentially data design pattern used to organize data logically. “The medallion architecture describes a series of data layers that denote the quality of data stored in the lakehouse. Azure Databricks recommends taking a multi-layered approach to building a single source of truth for enterprise data products. This architecture guarantees atomicity, consistency, isolation and durability as data passes through multiple layers of validations and transformations before being stored in a layout optimized for efficient analytics,” details Microsoft, on the learn Microsoft web portal.Data Masking & Synthetic Data “The next step is to find ways in which to give non-production teams realistic data without risk; so this means stepping into techniques including data masking and the use of synthetic data. Synthetic data is particularly beneficial when there is a lack of real data that matches a new business case, or when compliance demands that access to production data in any form is forbidden. It’s also fast to create and useful for large-volume requirements like unit testing,” explained Perforce’s Karam. Static data masking replaces sensitive data like personally identifiable information with synthetic but realistic values, which are deterministic and persistent, so that the referential integrity and demographics are maintained. This means that software developers have genuinely useful data without the risk of accidentally exposing sensitive customer data. As a working example, development teams at a bank could see a customer’s balance to look for anomalies, spikes or other outliers, but they would have no idea which customer it might belong to. Date of birth, social security and bank account number and other personal identifiers would all be masked. Many organizations are likely to have a place for both techniques, which are supported by highly automated tools to mitigate any additional workload on developers.“New use cases in AI can also help. Beyond synthetic data, AI is being used for automated testing with natural language processing, relieving testing teams from the burden of writing test scripts and maintaining data relationships with production,” said Karam. “Even if an organization is already ‘all in’ on data lakes, it should continue to treat software development and quality assurance data as separate data environments that are risk-averse, solid, clean, compliant and delivered fast so that teams can build without concern. The data lake should also have separate workspaces for non-production teams with guaranteed compliant data so they can jump right in safely. It’s like having a roped-off children’s pool in the shallow end of the lake for non-production, but the production part in the deep end is off-limits.” Key providers in the data lake arena include Amazon ; Microsoft Azure Data Lake and the company’s data lake analytics service; Google with its BigLake ; AI data cloud company Snowflake and Databricks with its already-referenced relationship to Microsoft. Although Perforce didn’t peddle its own agenda or message set in this discussion, the company competes in version control with Git, Atlassian Bitbucket Data Center, Apache Subversion and Mercurial to name a handful. In software testing, Perforce shares its market with BrowserStack, Sauce Labs, LambdaTest and into application lifecycle management, the organization comes up against IBM’s Engineering Lifecycle Management among others. Taking these steps and approaches tabled here could help to pinpoint, ring-fence and mitigate the risks around data lake information and balance its role against the need for its protection. The crocodiles may still be circling, but there are safe ways to enter the water if we know what kind of protective clothing to wear. These processes might not kill off the lake crocodiles , but it might mean a few of them are forced back to shore.
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Take to the Lake kayaking event set for Aug. 23 at Lower Shaker Lake: Press RunThis week's Press Run has information about the Doan Brook Watershed Partnership's Take to the Lake event, scheduled for Aug. 23 at Lower Shaker Lake; a national champion from University Heights; a pickleball event coming in late September to Solon; and more.
Read more »
‘They’re going to kill this beautiful lake’: Bear Lake residents raise alarm over lakeside developmentsRapid development, population growth and increased tourism around Bear Lake are raising concerns among residents, a recent study found.
Read more »
Avon Lake Public Library director leaving: Short Takes on Avon, Avon Lake and North RidgevilleAvon Lake Public Library (ALPL) Director William Rutger is taking his talents back to Sandusky as he was recently named the new executive director for the Sandusky Library.
Read more »
‘They’re going to kill this beautiful lake’: Bear Lake residents raise alarm over lakeside developmentsAmy Nay anchors Good Day Utah Weekend each Saturday and Sunday morning on Fox 13 from 6:30-9AM. She also fills in on the desk and out in the field for Fox 13 News, where she’s been proud to be a part of the team.
Read more »
Rancid odors fill the air in Gages Lake after hundreds of fish die in Valley LakeThe fish kill-off happened in Valley Lake in the suburb of Gages Lake, and the smell has been keeping many from enjoying the park during an otherwise optimal summer period.
Read more »
2 injured in jet ski crash on Gages Lake in Lake County, IllinoisTwo men were injured in a jet ski crash on Gages Lake, Lake County, Illinois, law enforcement officials said Monday.
Read more »
