Cloudera, the platform for machine learning and data analytics, looks back at the data economy in 2018 -- and predicts key developments in the new year
December 29 20128: 2018 was a year of fundamental change. Underpinning this was the impact of data management and analytics and of course GDPR. Commentary from Cloudera looks back over the year and looks at six key tech areas for 2019…
We have created IoT solutions and edge networks that are far too gullible and trusting.
In 2019, security has to be the number one focus point for organisations to ensure the safety and efficacy of edge devices and networks accordingly. There are too many vulnerabilities and gaps in the security posture for IoT devices -- organisations must take a proactive approach to securing devices. Organizations must use the data, metadata, device logs - treating IoT devices like any other network device to predict and accurately respond to the available signals.
Context is the next major frontier in IoT.
More data islands have been created with IoT, we are now starting to bridge the islands but we don’t speak the same collective language. The ability to acquire data from disparate systems and align it on common ontologies so we can trust and utilize the data. The clockspeed for decision-making is increasing, while information expands exponentially underneath our feet. As AI and machine learning evolve, allowing these capabilities to organize the data, attribute it from a universe of observations, and produce auto-didactic insights, will give us opportunities not yet imagined. Lineage - “what did we know and when did we know it” will be a key capability that allows organizations to use data optimally.
Next year we will see further use cases of IoT in home spaces, smart cities and more industrial use cases in automation or autonomous vehicles. Technology ecosystems are forming so a holistic view of data across the cloud to the edge is important to maximise the benefit of the data used across these ecosystems. Cloudera can do that by making sense of the community and provide the value add and protect the data and the consumer by assuring governance and security.
The fines associated with non-compliance of the regulation are significant: up to 4% of annual global turnover or $20 million, whichever is greatest. Even if an organisation would not flinch at those kinds of numbers, the impact on their reputation would certainly get them to care about complying. GDPR to a large extend is about showing your customers and employees you are careful with their data, that it is used for the right purpose and that, ultimately, they have control. With that control also comes trust. And any organisation care about that.
Companies made personally accountable for how they treat privacy and personal data. Yes, it is true companies are now personally accountable for GDPR regulated data across the complete data flow, including partners that they need to exchange information with. That also makes it crucial for smaller organisations, suppliers to larger ones, to achieve and maintain their GDPR compliance as it becomes a competitive differentiator.
The effects on cloud computing such that for organisations, it is important to ensure that the cloud services they use are compliant and that the systems and applications they design do not expose risk.
Do you think GDPR will expand and become a global regulation in 2019?
Expanding GDPR to become a global regulation is a certainly a potential further evolution. Already Cloudera customers and organisations that would not be subject to the regulation are taking it as their starting point for their own personal data privacy and protection guidelines. For it to become a truly global regulation though, it will first need to prove its worth in its current form; once that has progressed well and has proven workable, the chances of it influencing international practice will be much higher. GDPR in its current form may and likely will also evolve further. Organisations that build a solid foundation now, will be able to maintain compliance with less effort as the regulation evolves.
80% of all healthcare data is unstructured and for clinicians, doctors, nurses and surgeons, an incredible amount of insight remains hidden away in troves of clinical notes, EHR data, medical images, and omics data to understand patient records better. We are witnessing a revolution in the healthcare industry currently, in which there is now an opportunity to employ a new model of improved, personalized, evidence and data-driven clinical care.
To arrive at quality data, organizations are spending significant levels of effort on data integration, visualization, and deployment activities but organizations are increasingly restrained due to budgetary constraints and having limited data sciences resources.Healthcare faces many challenges, including developing, deploying, and integrating machine learning and artificial intelligence (AI) into clinical workflow and care delivery. Having the proper infrastructure with the required storage and processing capacity will be expected in order to efficiently design, train, execute, and deploy machine learning and AI solutions. Cloudera is committed to supporting healthcare professionals and institutions to support the next stage of patient care and medical development.
4. Data Warehousing
Data Management goes Cloud? As more organizations continue to see the economic and ease of use advantages of the cloud we expect to see increased investment in data management in the cloud. Data analytics use cases continue to lead the charge, especially for self-service, transient workloads, and short term workloads. Yet with new technologies that allow us to share data context (security models, metadata, source, and transformation definitions) we will see many organizations grow in use of cloud data management as more than just a compliment to on-premise models, as well as moving to private and hybrid cloud deployments, with greater confidence. New data types will continue to be required to satisfy business analytics, including social media and Internet of Things (IoT), driving the need for inexpensive, flexible storage best served by data management in the cloud. The cloud will also support emerging and new use cases such as exploration (iteratively performing ad-hoc queries into data sets to gain insights through discovering patterns) and machine learning without increasing IT resource demands, fueling further adoption.
5. Machine Learning
We are just at the beginning of the enterprise machine learning transformation. In 2019, we'll see a new step in maturity, as companies advance from PoCs to production capabilities.
Enterprise machine learning (ML) adoption will continue as businesses look to automate pattern detection, prediction and decision making to drive transformational efficiency improvement, competitive differentiation and growth. As early adopters advance from proof-of-concepts to production deployment of multiple use-cases, we’ll continue to see an emergence of technologies and best practices aimed at helping operationalize, scale and ultimately industrialize these capabilities to achieve full transformational value.
As companies understand the value of cloud to their existing infrastructure and applications, choice will become increasingly important. The choice to have a mix of public cloud and on-prem as well as multi-cloud provides companies with the flexibility to choose a solution that best fits their needs. Any vendor that only offers one option and “locks in” a company will find their customers will be at a disadvantage. With this choice of deployment options, the need for a consistent framework that ensures security, governance, and metadata management will become even more important. This will simplify the development and deployment of applications, regardless of where data is stored and applications are run. This framework will also ensure that companies can use a variety of machine learning and analytic capabilities, working in concert with data from different sources into a single coherent picture, without the associated complexity.
These options are part of a larger move to a hybrid cloud model, which will have workloads and data running in private cloud and/or public cloud based on the needs of the company. Bursting, especially with large amounts of data, is time consuming and not an optimal use of hybrid cloud. Instead, specific use cases such as running transient workloads in the public cloud and persistent workloads in private cloud provide a “best of both worlds” deployment. The hybrid model is a challenge for public cloud as well as private cloud only vendors. To prepare, vendors are making acquisitions for this scenario, most recently the acquisition of Red Hat by IBM. Expect more acquisitions and mergers among vendors to broaden their product offerings for hybrid cloud deployments.