<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Big data Archive - agile Companies</title>
	<atom:link href="https://agile-companies.com/tag/big-data-en/feed/" rel="self" type="application/rss+xml" />
	<link>https://agile-companies.com/tag/big-data-en/</link>
	<description>Flexible, modern &#38; digital</description>
	<lastBuildDate>Thu, 08 Apr 2021 20:17:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://agile-companies.com/wp-content/uploads/2021/03/cropped-agile-facebook-1-32x32.jpg</url>
	<title>Big data Archive - agile Companies</title>
	<link>https://agile-companies.com/tag/big-data-en/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>What is data mining?</title>
		<link>https://agile-companies.com/what-is-data-mining/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:38 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/what-is-data-mining/</guid>

					<description><![CDATA[<p>The term data mining is often used when it comes to the storage and management of information in the big data area. Many companies use data mining as a tool by enabling the systematic application of computer-based procedures to find patterns, trends and relationships within large databases. It builds on various findings from the fields [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-is-data-mining/">What is data mining?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>The term data mining is often used when it comes to the storage and management of information in the big data area. Many companies use data mining as a tool by enabling the systematic application of computer-based procedures to find patterns, trends and relationships within large databases.</p>



<p>It builds on various findings from the fields of computer science, statistics and mathematics by performing analyzes of databases. These analyzes pursue the goal of finding connections, patterns, trends and relationships between information within large databases and making them usable.</p>



<p>Data mining works in a purely automated manner, which results in both cost and time savings. Companies can then use the results provided to make decisions about strategies or problem solving more easily.</p>



<h2 class="wp-block-heading">Functions</h2>



<p>Data mining is mostly used for the achievement of several goals by companies. In order to achieve these goals, it has to do a variety of tasks. </p>



<p>This includes:</p>



<p><strong>classification</strong> : Object data are divided into classes.</p>



<p><strong>Segmentation:</strong> Combination of feature-like objects into groups.</p>



<p><strong>Forecast:</strong> Prediction of unknown or new features.</p>



<p><strong>Dependency analysis:</strong> Knowledge of connections and relationships between features of objects.</p>



<p><strong>Deviation Analysis:</strong> Identification of objects whose characteristics are not dependent on other objects.</p>



<h2 class="wp-block-heading">Significance for big data areas</h2>



<p>While big data often serves as a framework for data mining, the latter does not tend to be linked to it. Because data mining only describes an analysis of data stocks for characteristics and relationships of individual objects. It is often used in the context of large databases, such as in the area of big data, but it can also be used for smaller databases.</p>



<p>Nonetheless, it can be found far more frequently in the fume cupboard on big data and uses the technical basis to effectively obtain information from existing data. In addition to artificial intelligence, it also uses statistical algorithms. This enables more structure and transparency to be promoted and more relevant results to be delivered, especially with large databases that are often confusing.</p>



<h2 class="wp-block-heading">Who can benefit from data mining?</h2>



<p>Data mining is already used in practice in a large number of areas, as it offers great potential for users. For example, it is currently widely used in finance, marketing and medicine, and even as a tool for police analysis. But data mining is also used for improved customer service and risk analyzes, for example by banks and insurance companies. It can also be used to analyze the buying behavior of customers and is therefore also very popular in the area of online shops.</p>



<p>The following articles also provide more on the subject of data and big data:</p>



<ul class="wp-block-list"><li><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></li><li><a href="https://agile-companies.com/big-data-with-hadoop/">Big data: yesterday, today and tomorrow!</a></li><li><a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a></li><li><a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a></li></ul>



[werbung]



<p>Image source:<a href="https://pixabay.com"> pixabay.com</a></p>


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/what-is-data-mining/">What is data mining?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How does a data warehouse work?</title>
		<link>https://agile-companies.com/data-warehouse-work/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:38 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/how-does-a-data-warehouse-work/</guid>

					<description><![CDATA[<p>In the context of big data, one always needs powerful platforms that can efficiently store a large amount of data. Such a platform is also called a data warehouse. This analyzes the information it contains according to certain patterns. Data warehousing process The data warehousing process, which is often used to describe how it works, [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/data-warehouse-work/">How does a data warehouse work?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In the context of big data, one always needs powerful platforms that can efficiently store a large amount of data. Such a platform is also called a data warehouse. This analyzes the information it contains according to certain patterns.</p>



<h2 class="wp-block-heading">Data warehousing process</h2>



<p>The data warehousing process, which is often used to describe how it works, comprises four main main steps for analyzing data by managing the data in the data warehouse and evaluating it for results.</p>



<h4 class="wp-block-heading">The 4-stage analysis process of a data warehouse</h4>



<ol class="wp-block-list"><li>Acquisition of data from the source system</li><li>Loading the data</li><li>Backup of the data</li><li>Analysis and evaluation of the stored data</li></ol>



<h2 class="wp-block-heading">This is how a data warehouse is structured</h2>



<p>A data warehouse, like a real building, is basically a construct made up of several elements. The foundation is an operational database that contains a large amount of information. The so-called staging area, which has the task of pre-sorting the information, finally rises from the foundation. Only after special ETL processes that collect, extract, transform and load the data according to a predetermined structure does the information finally reach the data warehouse. This enables separate access to data, independent of operational data stores. Finally, the information can be accessed with special data access tools. This is possible on different levels, the so-called data marts.</p>



<p>In order to obtain an even better structure with large amounts of data, so-called OLAP databases can also be used. These enable the consolidation of information from different areas and can efficiently map relationships and hierarchies.</p>



<p>However, it should be noted that every data warehouse is only as high-quality as the data on which it is based. Poor data quality or incomplete data stocks can lead to considerable problems in the analysis processes.</p>



<h2 class="wp-block-heading">Data warehouse tasks</h2>



<p>In the context of big data, it is now essential for companies to have an overview of the mass of information in order to be able to efficiently evaluate the stored data. For this reason, a data warehouse usually has four important tasks.</p>



<ul class="wp-block-list"><li><strong>Central collection of all data:</strong> Data is compressed at a collection point.</li><li><strong>Sorting of the data stocks:</strong> Separation into analytical and unprocessed data sets in order to obtain unadulterated results.</li><li><strong>Data integration:</strong> Combination of data from different sources in different formats into an evaluable model.</li><li><strong>Long-term storage of the data:</strong> Backup of the data in the form of a history for specific query options and time-related analyzes.</li></ul>



<h2 class="wp-block-heading">Advantages and disadvantages</h2>



<p>A data warehouse is used by many companies as a helpful tool when it comes to storing large amounts of data. In addition to numerous advantages, there are also some disadvantages when using it.</p>



<h4 class="wp-block-heading">benefits</h4>



<ul class="wp-block-list"><li>powerful function for storing large amounts of data</li><li>special tools for the individual areas</li><li>Data quality management</li></ul>



<h4 class="wp-block-heading">disadvantage</h4>



<ul class="wp-block-list"><li>sometimes long loading times (especially with increasing volumes of data)</li><li>unstructured data cannot be processed (ins. films or audios)</li><li>no possibility of real-time streaming</li></ul>



<p>The following articles also provide more on the subject of data and big data:</p>



<ul class="wp-block-list"><li><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></li><li><a href="https://agile-companies.com/data-mart-and-data-lineage/">Big data: yesterday, today and tomorrow!</a></li><li><a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a></li><li><a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a></li></ul>



[werbung]



<p>Image source:<a href="https://pixabay.com"> pixabay.com</a></p>


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/data-warehouse-work/">How does a data warehouse work?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Big data with Hadoop</title>
		<link>https://agile-companies.com/big-data-with-hadoop/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:37 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/big-data-with-hadoop/</guid>

					<description><![CDATA[<p>Data processing in the area of big data often poses great difficulties for many companies. To counteract this problem, many organizations use tools such as software-based frameworks. These also include Hadoop, which is connected to Java. What is Hadoop The Java-based software framework Hadoop can most easily be imagined as a kind of shell that [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/big-data-with-hadoop/">Big data with Hadoop</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Data processing in the area of big data often poses great difficulties for many companies. To counteract this problem, many organizations use tools such as software-based frameworks. These also include Hadoop, which is connected to Java.</p>



<h2 class="wp-block-heading">What is Hadoop</h2>



<p>The Java-based software framework Hadoop can most easily be imagined as a kind of shell that can be tailored to the most varied of architectures and operated by a wide variety of workers, in this case the hardware. </p>



<p>The framework was invented by Doug Cutting, who developed Hadoop into one of the best projects in the field of the Apache Software Foundation by 2008. Cutting developed the software framework for better management of distributed and scalable systems. It is based on the MapReduce algorithm from Google, which uses Hadoop to combine large amounts of data in detailed computing processes on distributed but networked computers. </p>



<p>Hadoop is not only so popular, but also because it is made available to everyone free of charge as free source code by Apache and is also written in the well-known Java programming language.</p>



<h2 class="wp-block-heading">What role does Hadoop play in big data?</h2>



<p>Hadoop&#8217;s expertise in being able to process large amounts of data, no matter what kind, in the area of big data, not only in a structured way, but also quickly, make the software framework an attractive tool for many companies. In particular, the ability to process data from different sources with different structures in parallel in a bundle in a clear and tangible way is a great enrichment, especially for organizations in the business intelligence industry.</p>



<p>In addition, with the help of Hadoop it is also possible to efficiently solve complex computing tasks in the petabyte area and, on the basis of this, for example, to develop new corporate strategies, to collect basic information for important decisions or to considerably simplify the reporting of an organization.</p>



<h2 class="wp-block-heading">construction</h2>



<p>Hadoop is made up of several building blocks which, when combined, make all the basic functions of the software framework possible. </p>



<p>These are:</p>



<p>Hadoop is made up of individual components. The four central components of the software framework are:</p>



<ul class="wp-block-list"><li>Hadoop Common,</li><li>Hadoop Distributed File System (HDFS),</li><li>MapReduce algorithm</li><li>Yet Another Resource Negotiator (YARN).</li></ul>



<p>Hadoop Common is responsible for the basic functions and thus also serves as the basis for all other tools, such as the Java archive files. Hadoop Common is connected to the other elements via interfaces with defined access rights.</p>



<p>The Hadoop Distributed File System is used to store the individual data stocks on different systems. According to the manufacturer, the HDFS is able to manage data in the hundreds of millions.</p>



<p>Hadoop is powered by Google&#8217;s MapReduce algorithm. This enables the software framework to distribute complex computing tasks to various systems, which then process them in parallel. This can enormously increase the speed of data processing.</p>



<p>The MapReduce algorithm is supplemented by the Yet Another Resource Negotiator. The YARN manages the individual resources by assigning them to their tasks in the respective clusters.</p>



<h2 class="wp-block-heading">functionality</h2>



<p>As already mentioned, Hadoop is largely based on Google&#8217;s MapReduce algorithm. In addition, central tasks are also controlled by the HDFS file system, which is responsible for distributing the data to the individual bundle components. The MapReduce algorithm from Google, in turn, splits the processing of the data so that it can run in parallel on all bundle components. Hadoop then brings the individual results together to form a large overall result. </p>



<p>Hadoop divides the data volumes independently into individual clusters. Each cluster has a single master (represented by a computer node) while the other computer nodes are subject to the one in slave mode. The slaves serve as storage locations for data, while the master is responsible for replication and thus makes the data available on several nodes. Thanks to its ability to determine the exact location of a data block at any time, the master protects efficiency against data loss. In addition, he takes on the role of a supervisor of the individual nodes, who automatically accesses its data block if a node is absent for a long period of time and replicates and saves it again. </p>



<p>The following articles also provide more on the subject of data and big data:</p>



<ul class="wp-block-list"><li><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></li><li><a href="https://agile-companies.com/big-data-with-hadoop/">Big data: yesterday, today and tomorrow!</a></li><li><a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a></li><li><a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a></li></ul>



[werbung]



<p>Image source:<a href="https://pixabay.com"> pixabay.com</a></p>


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/big-data-with-hadoop/">Big data with Hadoop</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What is data mart and data lineage?</title>
		<link>https://agile-companies.com/data-mart-and-data-lineage/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:36 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/what-is-data-mart-and-data-lineage/</guid>

					<description><![CDATA[<p>In the age of digitization and big data, a lot revolves around one thing: data. Terms such as data mart and data lineage regularly catch the eye. It is not always clear exactly what the technical terms are, which is why this article is intended to provide a brief overview. What is a data mart? [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/data-mart-and-data-lineage/">What is data mart and data lineage?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In the age of digitization and big data, a lot revolves around one thing: data. Terms such as data mart and data lineage regularly catch the eye. It is not always clear exactly what the technical terms are, which is why this article is intended to provide a brief overview.</p>



<h2 class="wp-block-heading">What is a data mart?</h2>



<p>Data marts are a kind of collection point for user-defined data. In doing so, data is extracted from large data stocks and made accessible in isolation for certain user groups. They thus form a sub-segment of a data warehouse and can help to make certain data accessible to the user more quickly and with less effort. This not only saves time but also costs.</p>



<h4 class="wp-block-heading">Data Mart vs Data Warehouse</h4>



<p>Both data marts and data warehouses are used to store and manage data records until they are used. Data warehouses are specialized in organizing the entire data of a company, while data marts only organize collection points for the data of individual departments. They represent a tool that isolates certain data records and makes them available separately to the respective function field.</p>



<h4 class="wp-block-heading">species</h4>



<p>A basic distinction is made between 3 different categories for data marts.</p>



<h4 class="wp-block-heading">Dependent </h4>



<p>Dependent data marts are always directly related to an enterprise data warehouse in that they are developed according to the top-down principle. For this purpose, data is first combined at a collection point and then certain data records are extracted, which are then distributed in their intended data mart.</p>



<h4 class="wp-block-heading">Independent</h4>



<p>An independent data mart, on the other hand, is not linked to a data warehouse and thus forms an autonomous system. Data is obtained from internal and external data sources of an organization instead of from the collection point of the data warehouse and then specifically distributed to the individual data marts. This type of data marts is thus much easier to implement and particularly helpful when pursuing short-term business goals.</p>



<h4 class="wp-block-heading">Hybrid</h4>



<p>Hybrid data marts describe the connection of dependent and independent data marts in a system by obtaining data from a data warehouse as well as from internal and external sources of a company. This allows you to combine the advantages of both methods and create a complex but clear system.</p>



<h4 class="wp-block-heading">benefits</h4>



<p>Because of their function as an accelerator when accessing special data sets, data marts offer many advantages.</p>



<ul class="wp-block-list"><li>Minimizing the time it takes to acquire certain data</li><li>Ready for use much faster than an enterprise data warehouse</li><li>Data marts require comparatively little specialist knowledge for implementation</li><li>Inexpensive alternative to an enterprise data warehouse</li><li>Data marts help improve the performance of a data warehouse because they can obtain data with less effort</li><li>Thanks to the data mart, KPIs are easier to monitor</li><li>Data marts support data maintenance by assigning data records to specific departments, which in turn can monitor them independently</li></ul>



<h2 class="wp-block-heading">What is data lineage?</h2>



<p>Data lineage plays a role in connection with the origin of data, which is why the term is often used as a synonym. Data Lineage has the task of recording changes and optimizations of data as well as the development of their elements in a history. It tracks a data record on its journey from creation to adjustments to the final destination and at the same time also documents the associated properties. Simply put, data lineage can be thought of as a kind of biography of a data set.</p>



<h4 class="wp-block-heading">benefits</h4>



<p>With its function, data lineage offers many advantages for the user.</p>



<ul class="wp-block-list"><li>Data can be fully monitored at any time</li><li>Increased transparency about the development and history of data sets</li><li>The quality of the data is retained</li><li>Helpful when it comes to confidential data that needs to be protected</li><li>Companies can use data lineage to more easily comply with data-based standards and regulations</li></ul>



<h3 class="wp-block-heading">Beyond data mart and data lineage</h3>



<p>The following articles also provide more on the subject of data and big data:</p>



<ul class="wp-block-list"><li><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></li><li><a href="https://agile-companies.com/big-data-with-hadoop/">Big data: yesterday, today and tomorrow!</a></li><li><a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a></li><li><a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a></li></ul>



[werbung]


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/data-mart-and-data-lineage/">What is data mart and data lineage?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What is a data lake?</title>
		<link>https://agile-companies.com/what-is-a-data-lake/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:36 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/what-is-a-data-lake/</guid>

					<description><![CDATA[<p>Big data analyzes usually require a large amount of data in order to capture and collect all information in its raw state. This data storage resembles a real sea in size, which is why the technical term &#8220;data lake&#8221; has been established for it. You can find out exactly what this is all about in [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-is-a-data-lake/">What is a data lake?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Big data analyzes usually require a large amount of data in order to capture and collect all information in its raw state. This data storage resembles a real sea in size, which is why the technical term &#8220;data lake&#8221; has been established for it. You can find out exactly what this is all about in this article.</p>



<h2 class="wp-block-heading">definition</h2>



<p>As a large data store, the data lake manages the entire mass of data in its original form, i.e. in its raw format. He makes use of the collection of information from a wide variety of sources. It makes no difference to the data lake whether the data has a structure or not. This large data store also does not require any prior validation or reformatting of the data. However, a data lake cannot manage number or text-based data. In addition, it can also save information from the media area, such as images and videos.</p>



<p>What appears to be a chaotic collection of data, however, follows a system. Because even if the data lake receives all information in its individual raw states, it structures it as soon as the data is required. Then, if necessary, he also initiates a restructuring of the data.</p>



<h2 class="wp-block-heading">Use of a data lake</h2>



<p>The many different ways of using and applying the information collected by a data lake, such as flexible analyzes, make the large data store extremely attractive. However, the application requires some requirements in order to be able to use the system optimally. </p>



<p>The most important basic function of the data lake is primarily to be able to collect and manage data from a wide variety of sources. By grouping all data in one place, data silos can be avoided and information is available more quickly. However, given the large amount of data, even a single storage space does not guarantee problem-free data management. Therefore, data lakes require common frameworks as well as the creation of protocols of the contained databases in order to bring more structure into the mass of information. </p>



<p>In the course of security and data protection requirements, additional access controls must be implemented and information encryption must be ensured. At the same time, data lakes should always enable a function of backing up and restoring data.</p>



<h2 class="wp-block-heading">Advantages and disadvantages</h2>



<p>The use of a data lake is particularly useful when large amounts of data are repeatedly generated that have to be managed. At the same time, however, such a large collection of information can also pose a number of hurdles. </p>



<h4 class="wp-block-heading">benefits</h4>



<ul class="wp-block-list"><li>fast and uncomplicated data storage in raw format</li><li>low requirements with regard to the required computing power</li><li>provides the basis for detailed and content-rich analyzes</li><li>many possibilities for the evaluation of data, since all data is collected without prior sorting</li><li>Big data analytics can be a competitive advantage</li></ul>



<h4 class="wp-block-heading">disadvantage</h4>



<ul class="wp-block-list"><li>High requirements in terms of data protection and security</li><li>Need for a complex data protection system</li><li>Requirement of prior implementation of access rights and regular user controls</li></ul>



<h3 class="wp-block-heading">Conclusion</h3>



<p>As you can see, a data lake is a real asset, especially for companies with large volumes. This is because, when used optimally, real competitive advantages can be achieved thanks to in-depth Big Data analyzes. At the same time, however, sufficient data protection must be ensured with regard to the amount of data. However, this sometimes makes the use of a data lake very complex.</p>



<p>The following articles also provide more on the subject of data and big data:</p>



<ul class="wp-block-list"><li><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></li><li><a href="https://agile-companies.com/big-data-with-hadoop/">Big data: yesterday, today and tomorrow!</a></li><li><a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a></li><li><a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a></li></ul>



[werbung]



<p>Image source:<a href="https://pixabay.com"> pixabay.com</a></p>


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/what-is-a-data-lake/">What is a data lake?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Case study: building a data lake for the use of big data</title>
		<link>https://agile-companies.com/use-of-big-data/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:29:35 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/case-study-building-a-data-lake-for-the-use-of-big-data/</guid>

					<description><![CDATA[<p>Big data is an important topic. That already shows a study by Bitkom . In 2018, the association surveyed over 600 companies on trending topics and found the following results: 57 percent are planning investments in big data or are already being implemented The five top topics are big data (57%), Industry 4.0 (39%), 3D [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/use-of-big-data/">Case study: building a data lake for the use of big data</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Big data is an important topic. That already shows<a rel="noreferrer noopener" aria-label="eine Studie der Bitkom (öffnet in neuem Tab)" href="https://www.bi-scout.com/studie-big-data-steht-bei-sechs-von-zehn-unternehmen-an-erster-stelle" target="_blank"> a study by Bitkom</a> . In 2018, the association surveyed over 600 companies on trending topics and found the following results:</p>



<ul class="wp-block-list"><li>57 percent are planning investments in big data or are already being implemented</li><li>The five top topics are big data (57%), Industry 4.0 (39%), 3D printing (38%), robotics (36%) and VR (25%)</li><li>But: New concepts and possibilities such as artificial intelligence and blockchain have only rarely been used so far</li></ul>



<p><strong>Reading tip:</strong><a href="https://agile-unternehmen.de/was-ist-big-data-definition/"><strong> </strong></a><strong><a href="https://agile-companies.com/what-is-big-data-definition/">What is big data</a></strong></p>



<h2 class="wp-block-heading">Implementation of big data only hesitantly </h2>



<p>According to the study, the potential of big data is only being used hesitantly. According to the study, the reasons for this are the requirements for data protection (63%) and the technical implementation (54%) as well as a lack of specialists (42%). </p>



<p><strong>Reading tip:</strong><a href="https://agile-unternehmen.de/was-macht-ein-data-scientist/"><strong> </strong></a><strong><a href="https://agile-companies.com/what-does-a-data-scientist-do/">What is a data scientist</a></strong></p>



<p>I am currently working on the technical implementation. In order to really use big data in a meaningful way, numerous technical requirements have to be created. I would like to give a practical example, which should serve as an impulse for practice.</p>



<h2 class="wp-block-heading">Practical example: building a data lake</h2>



<p>Due to my job, I am often involved in customer projects that want to set up big data architectures. In the following I made a kind of blueprint from the majority of the projects. I would like to introduce them to you today. To make the example clearer, I&#8217;ll write the whole thing as a case study.</p>



<h3 class="wp-block-heading">initial situation</h3>



<p>The customer is a fictitious large bank. Out <strong>different sources</strong> For example databases and data streams, the system copies data in raw form into a <strong>Staging area. </strong>A data stream is a continuous flow of data records, the end of which can usually not be foreseen in advance, e.g. transfers from a bank or payments to an account, as well as the heart rate monitor in the hospital or the temperature measurement of a weather station. Staging is used to ensure that the raw data is saved in its current form with a time stamp. The advantage is that these are still there in the event that the external data source is lost. </p>



<p>The staging area saves the data on different hard drives using a secure and redundant storage format. The data is converted into a uniform format in the<strong> Norming Area</strong> saved again. With the help of<strong> SQL queries</strong> data is exported in CSV format and saved in Sharepoint. Numerous external IT consultants use this to prepare weekly reports in PowerPoint and Excel. The reports are sorted in folders with the source data to be found on Sharepoint.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img fetchpriority="high" decoding="async" src="https://agile-unternehmen.de/wp-content/uploads/2019/06/big-data-warehouse-1-1024x839.png" alt="Big data warehouse" class="wp-image-7644" width="574" height="470" srcset="https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-1024x839.png 1024w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-300x246.png 300w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-768x629.png 768w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-175x143.png 175w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-450x369.png 450w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1-1170x958.png 1170w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-warehouse-1.png 1243w" sizes="(max-width: 574px) 100vw, 574px" /><figcaption>Initial situation of the customer: Reports are generated by consultants in a data warehouse using SQL.</figcaption></figure></div>



<p><strong>Summary of the architecture:</strong></p>



<ul class="wp-block-list"><li>Data comes from different source systems</li><li>At the end of the day, the data is processed, standardized and recorded</li><li>A wide variety of queries are made</li><li>Reports are generated weekly by external service providers (MS Office and Sharepoint as storage location)</li></ul>



<p>Now we come into play. The customer asked my team and me to design a new architecture. The reasons for this were: </p>



<ul class="wp-block-list"><li>The reports require a lot of effort</li><li>Little flexibility (especially AD-HOC requests)</li><li>No versioned raw data</li><li>High costs for external service providers</li><li>Requirements regarding BCBS239 and MARISK can no longer be implemented (principles for the effective aggregation of risk data and risk reporting)</li></ul>



<h2 class="wp-block-heading">Target situation</h2>



<p>Now we have started to redesign the customer&#8217;s architecture. In the first step, we revised the charging processes. On the one hand, we loaded new data into our lake every day (at night) through batch loading processes. On the other hand, the data streams are loaded continuously, in contrast.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img decoding="async" src="https://agile-unternehmen.de/wp-content/uploads/2019/06/big-data-data-lake-1024x838.png" alt="" class="wp-image-7615" width="573" height="468" srcset="https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-1024x838.png 1024w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-300x246.png 300w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-768x629.png 768w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-175x143.png 175w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-450x368.png 450w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake-1170x958.png 1170w, https://agile-companies.com/wp-content/uploads/2019/06/big-data-data-lake.png 1252w" sizes="(max-width: 573px) 100vw, 573px" /><figcaption>New architecture of the customer &#8211; A data lake provides various evaluations throughout the company and enables numerous new potentials</figcaption></figure></div>



<h3 class="wp-block-heading">swell</h3>



<p>All of this data is saved on hard drives and given a versioning stamp. This concept is called data lake and metadata management. When it comes to the data lake, we are talking about a very large data storage medium, i.e. an oversized hard drive that accepts data from a wide variety of sources in its raw format. </p>



<h3 class="wp-block-heading">Data lake</h3>



<p>The data lake helped us to carry out so-called data lineage. Think of it this way: Data lineage is like a patient record with important information about when data was created, the current data age and changes. The presentation is usually very clear in the form of a diagram.</p>



<p><strong>Advantage: In reports we were able to prove exactly on which data basis we produced them at time X.</strong> <strong>With the help of data lineage, we were then able to track the change in the data.</strong></p>



<p>Our data lake was based on Hadoop with modern monitoring by Prometheus, Grafana and Icinga 2. We have set up high-availability clusters for this purpose. The Hadoop Distributed File System (HDFS) is a distributed file system which, using a MapReduce algorithm, can split complex and computationally intensive tasks into many small individual parts on several computers. This means that evaluations based on the raw data are possible during runtime. </p>



<p><em>By the way: Due to the guidelines of MARISK and BCBS 239 for banks, we have loaded risk data into a separate cluster. This cluster could only be accessed by authorized persons.</em> <em>There are concerns that so many cross-circuits are drawn in the lake through the combination of data that we have saved certain data for security reasons.</em></p>



<p>Now we want to process the raw data from the brine as well. First we have the data from standardized reports that are used every week or requested by certain departments as<strong> Data marts</strong> copied to extra hard drives. This enabled us to guarantee access control and improve performance. We thus had fixed (permanent) and volatile data marts (project-based).</p>



<p><em>Limitation</em> :<em> I realize that data marts are a concept of the data warehouse, but in our context they were really helpful because we cannot rely 100% on the data lake.</em></p>



<p>With the help of new<strong> Algorithms</strong> (internal algorithms of the customer) we normalized the data at runtime or for the data marts. With the help of software, we also tried to gain new insights and knowledge by forming clusters using artificial intelligence<strong> Models</strong> to collect from the data. For example, we had data correlated or investigated departure. This has been done by a special AI service provider. Another element is the data catalog. Of the<strong> Data catalog</strong> is a catalog of metadata and shows the presentation rules for all data and the relationships between the various data.</p>



<p><em>Note: The data catalog has the important function that no relationship can be established between certain personal risk data without access to the catalog.</em></p>



<h3 class="wp-block-heading">evaluation</h3>



<p>Now we come to the logic of<strong> evaluation</strong> . We want to ensure that the various stakeholders in the company simply adhere to three standards<strong> Query server</strong> can send. The requests are sensibly distributed by a load balancer and also protected by an access control. The three standards are:</p>



<ul class="wp-block-list"><li>SQL</li><li>Hive (SQL-like Hadoop compatible language)</li><li>Tableau (software tool)</li></ul>



<h3 class="wp-block-heading">Reporting</h3>



<p>Now we come to the actual report generation for the end user. We have our three groups in the company for this purpose. These are:</p>



<ul class="wp-block-list"><li>classic controlling,</li><li>the specialist departments (and project managers) and</li><li>the data scientists.</li></ul>



<p>All three roles can send requests to our<strong> Query server</strong> send. There is a possible evaluation for each role:</p>



<ul class="wp-block-list"><li>Fixed and automated reports for controlling,</li><li>Interactive and customizable dashboards for the specialist departments and</li><li>Exploratory reports in Tableau for the data scientists.</li></ul>



<p>In summary, there are the standardized automatic reports, which we could view in CSV or Excel, as well as interactive dashboards for individual real-time reports using our own software. Furthermore, a team of data scientists had the goal of using Tableau to gain new knowledge from data (exploratory reports).</p>



<p><strong>Summary of the new architecture</strong> :</p>



<ul class="wp-block-list"><li>Raw data is loaded into the data storage (Hadoop cluster)</li><li>Map-Reduce algorithm intelligently distributes the evaluation</li><li>Evaluations are made directly from the raw data layer</li><li>Transformation always only at runtime</li><li>Data catalog for storing data relationships</li><li>Modeling through AI</li><li>Query Server allows various queries in various languages</li><li>Data lineage for versioning the data</li><li>Automated reports and interactive / exploratory dashboards through in-house development</li></ul>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Big data can significantly change companies and is right at the top of the agenda. The main obstacles, however, are the technical implementation and preparation of the data. Classical concepts are no longer capable of upgrading such data and companies are required to save data in a targeted manner and deliver up-to-date reports under pressure due to legal requirements. </p>



<p>In this case study, I have given an example of technical implementation that can serve as an impetus for practice. I built a data lake in the case study and used various well-known concepts such as Hadoop. My experience shows that the correct implementation of the concepts can help to make the potential of Big Data possible. It is important to draw meaningful reports and insights from the data.</p>



<p><strong>Reading tip:</strong><a href="https://agile-unternehmen.de/chancen-von-big-data/"><strong> </strong></a><strong><a href="https://agile-companies.com/data-warehouse-work/">Big data benefits</a></strong></p>



<p>Image source:<a href="https://de.freepik.com/fotos-vektoren-kostenlos/geschaeft"> Business photo created by mindandi &#8211; www.freepik.com</a></p>



[werbung]


[fotolia]
<p>Der Beitrag <a href="https://agile-companies.com/use-of-big-data/">Case study: building a data lake for the use of big data</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Big Data Risks &#8211; A Question of Implementation!</title>
		<link>https://agile-companies.com/big-data-risks/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:27:47 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/big-data-risks-a-question-of-implementation/</guid>

					<description><![CDATA[<p>&#8220;Big data creates mixed feelings for many people. The economic opportunities are obvious. But the possibilities of abuse are also evident ( Computer week ) &#8220;. Big data is certainly more than just hype and brings numerous new opportunities with it. However, there are also many risks, which are discussed in this article.Reading tip: What [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>&#8220;Big data creates mixed feelings for many people. The economic opportunities are obvious. But the possibilities of abuse are also evident (<a href="https://www.computerwoche.de/a/problemfall-big-data,2546584" target="_blank" rel="noopener"> Computer week</a> ) &#8220;. Big data is certainly more than just hype and brings numerous new opportunities with it. However, there are also many risks, which are discussed in this article.<br /><strong>Reading tip:<a href="https://agile-companies.com/what-is-big-data-definition/" target="_blank" rel="noopener"> What is big data</a></strong></p>
<h2>Big Data Risks: Monitoring</h2>
<p>An example can be found in the<a href="https://www.computerwoche.de/a/problemfall-big-data,2546584" target="_blank" rel="noopener"> Computer week</a> when looking for a perpetrator on the autobahn: &#8220;The investigators had installed cameras on seven relevant sections of the autobahn. These read in the license plates of all passing automobiles, including those of the vehicles being shot at. In April 2013, the police received reports of gunfire on trucks again within five days, a total of six. &#8220;<br />Of course, this massive storage of data allows a high level of surveillance. The verdict of Computerwoche: &#8220;The evaluation of massive amounts of data has an ambivalent character.<br />An example can also be the storage of numerous data by wearables by health insurance companies. The health of the individual becomes transparent and it can be understood how often you do sport and move. So it creates a permanent feeling of surveillance.</p>
<h2>Big data risks: Sensitive data requires special protection</h2>
<p>&#8220;Dealing with large amounts of data, especially data about internal company relationships, poses technical challenges for companies and IT departments. The balance between accessibility and security must be precisely balanced. &#8220;this can be found in the<a href="https://bigdatablog.de/2015/04/13/big-data-risiken-worauf-bei-der-datenauswertung-geachtet-werden-muss/" target="_blank" rel="noopener"> BigDatablog</a> . Sensitive data must therefore also be specially secured. Not only from abuse but also from manipulation.</p>
<h2>Big data risks: manipulation</h2>
<p>Another risk of big data is its manipulative use. For example, big data can be misused and voters can be influenced, as is the case with elections. It is therefore about the sensible and ethical use of the large amount of data.<br />So the magazine warns<a href="https://propagandaschau.wordpress.com/" target="_blank" rel="noopener"> Propaganda show</a> : &#8220;Critical contemporaries have always warned, but only now is it slowly becoming apparent &#8211; all the more powerfully &#8211; how data collections can and are already being used to target people who previously&#8221; thoughtlessly &#8220;made their thoughts available to third parties, specifically and unnoticed to manipulate.&#8221;<br />So the magazine continues to believe that &#8220;the data is today <em>Not</em> primarily used to analyze political moods and then to make laws and politics in the interests of the citizens&#8217; known opinions and interests, but rather to manipulate the raised moods and opinions with targeted measures in the interests of the elites. &#8220;</p>
<h2>Big data risks: uselessness</h2>
<p>A final risk is that large amounts of data are stored and certainly also evaluated, but you also like to sit in front of it and think: Mhmmm? And what do we do with it now? What does that tell us? When I was still on the road as an external consultant, I experienced this very often. We have evaluated and thought about a lot of things, but hardly came up with a meaningful idea of what that could mean.<br />So he warns too<a href="http://www.harvardbusinessmanager.de/blogs/a-862657.html" target="_blank" rel="noopener"> Havard Business Manager</a> : Ready and open to experiment: &#8220;Managers and analysts must be able to apply scientific methods in their business area. You need to know what reasonable working hypotheses look like. You also need to understand the basics of experiments and how they are set up &#8220;</p>
<h2>Conclusion: Big data but with care!</h2>
<p>In addition to the risks mentioned, it is important to use big data sensibly. For this, guidelines are certainly necessary on the one hand to change abuse and manipulation and, on the other hand, further training to prevent uselessness. Because there must also be a data culture in the company.<br />That&#8217;s what he says<a href="http://www.harvardbusinessmanager.de/blogs/a-862657.html" target="_blank" rel="noopener"> Havard Business Manager</a> At the end of the day, great efforts are needed in further training so that Big Data leads to more added value. It is about promoting a data-oriented mindset and analytical culture in the company and introducing new technologies.<br /><strong>Tip:<a href="https://www.amazon.de/hz/wishlist/ls/W0ILVF5COVHN?&amp;sort=default?&amp;tag=agileunter-21" target="_blank" rel="noopener"> Book suggestions too</a> Big data</strong><br />Of course, not everything is just a risk and big data can offer numerous opportunities such as well-founded prognoses for decisions and new products as well as an individual approach to customers. So read my follow-up article on this next week.<br /><strong>Reading tip:<a href="https://agile-companies.com/current-studies-on-big-data/"> Chances of</a> Big data</strong></p>
<figure id="attachment_4498" aria-describedby="caption-attachment-4498" style="width: 451px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-4498 " src="https://agile-unternehmen.de/wp-content/uploads/2017/11/big-data-chancen-risiken.png" alt="Big Data Chancen Risiken" width="451" height="266" srcset="https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken.png 1088w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-300x177.png 300w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-1024x603.png 1024w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-768x452.png 768w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-175x103.png 175w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-450x265.png 450w" sizes="(max-width: 451px) 100vw, 451px" /><figcaption id="caption-attachment-4498" class="wp-caption-text">Opportunities and risks of big data (own illustration)</figcaption></figure>
[werbung]<span class="collapseomatic " id="id69f9e846a5a6a"  tabindex="0" title="Verwendete Quellen anzeigen"    >Verwendete Quellen anzeigen</span><div id="target-id69f9e846a5a6a" class="collapseomatic_content ">Image source:<a href="https://www.freepik.com/free-photo/people-financial-results-analyzing-statistics_1145745.htm"> Designed by Freepik</a><br /></div>
<p> </p>

[fotolia]



<p></p>
<p>Der Beitrag <a href="https://agile-companies.com/big-data-risks/">Big Data Risks &#8211; A Question of Implementation!</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What does a data scientist do?</title>
		<link>https://agile-companies.com/what-does-a-data-scientist-do/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:27:47 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<category><![CDATA[Digital change]]></category>
		<category><![CDATA[digital transformation]]></category>
		<guid isPermaLink="false">https://agile-companies.com/what-does-a-data-scientist-do/</guid>

					<description><![CDATA[<p>The large amount of data continues to grow. In fact, it is now being said that data is the new oil. At the same time, there is also a new job description. The name of the data scientist appears more and more. So says the portal SAS : &#8220;Anyone who knows how strategically important knowledge [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-does-a-data-scientist-do/">What does a data scientist do?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>The large amount of data continues to grow. In fact, it is now being said that data is the new oil. At the same time, there is also a new job description. The name of the data scientist appears more and more. So says the portal<a href="https://www.sas.com/de_de/news/press-releases/2014/december/pm141202.html" target="_blank" rel="noopener"> SAS</a> : &#8220;Anyone who knows how strategically important knowledge can be drawn from large amounts of data and can also convey this has a key position in the company as a consultant for top management.&#8221;<br />
But if you look at the job advertisements you will find a lot about it and you ask yourself: What does a data scientist do? This article is intended to provide information.<br />
<strong>Reading tip:<a href="https://agile-companies.com/what-is-big-data-definition/" target="_blank" rel="noopener"> What is big data</a></strong></p>
<h2>What should a data scientist be able to do?</h2>
<p>If you look at the job advertisements, a data scientist should usually be able to do the following:</p>
<ul>
<li>Analytical talent</li>
<li>Expertise</li>
<li>communication</li>
<li>Urge to research</li>
<li>Coordination talent</li>
</ul>
<p>On the one hand, the scientist must recognize relationships in large amounts of data and be able to analyze them. Furthermore, he should also have business and specialist knowledge in order to understand the problems of the specialist areas. With the help of his communication skills, he can talk extensively with them and be in constant contact. His curiosity also allows him to solve difficult problems and work with hypotheses. But he is also a project manager and has to expand, manage and maintain the database.</p>
<p><figure id="attachment_4472" aria-describedby="caption-attachment-4472" style="width: 583px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="wp-image-4472" src="https://agile-unternehmen.de/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1.png" alt="was macht ein data scientist-1" width="583" height="345" srcset="https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1.png 1084w, https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1-300x177.png 300w, https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1-1024x606.png 1024w, https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1-768x454.png 768w, https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1-175x103.png 175w, https://agile-companies.com/wp-content/uploads/2017/11/was-macht-ein-data-scientist-1-450x266.png 450w" sizes="auto, (max-width: 583px) 100vw, 583px" /><figcaption id="caption-attachment-4472" class="wp-caption-text">Skills and tasks of a data scientist (own illustration)</figcaption></figure></p>
<h2>What does a data scientist do?</h2>
<p>There are many names for the new job profiles. The<a href="https://www.computerwoche.de/a/big-data-jobs-wer-macht-was,3215345" target="_blank" rel="noopener"> Computer week</a> has already broken down some of it and defines it as follows:</p>
<ul>
<li>The (Big) Data Engineer is the master of the data supply.</li>
<li>The management scientist is the mediator between the departmental worlds.</li>
<li>The data scientist provides answers to analytical questions based on data.</li>
<li>The data steward is responsible for monitoring data quality and integrity.</li>
</ul>
<p>The data engineer is responsible for merging the data and knows where the data is and how it is merged. The management scientist then analyzes this data and defines the actual problems. She says to the data scientist<a href="https://www.computerwoche.de/a/big-data-jobs-wer-macht-was,3215345" target="_blank" rel="noopener"> Computer week</a> : &#8220;The main task of the data scientist is to generate answers to analytical questions from data &#8211; with the help of analytical methods from the areas of statistics, machine learning or operations research.&#8221; The data steward monitors the data going there and ensures that it is technically correct.<br />
So in the end there are numerous job descriptions and these terms are not always used and clearly separated. Therefore this is only used as an orientation. I don&#8217;t think that every company separates this so strictly and often everything is summarized under the umbrella of the data scientist or data analyst. The following figure shows the fields of application and provides the answer to the question: What does a data scientist do.</p>
<p><figure id="attachment_4477" aria-describedby="caption-attachment-4477" style="width: 1500px" class="wp-caption alignnone"><img loading="lazy" decoding="async" class="wp-image-4477 size-full" src="https://agile-unternehmen.de/wp-content/uploads/2017/12/was-ist-ein-data-scientist.png" alt="was-ist-ein-data-scientist" width="1500" height="243" srcset="https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist.png 1500w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-300x49.png 300w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-1024x166.png 1024w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-768x124.png 768w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-175x28.png 175w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-450x73.png 450w, https://agile-companies.com/wp-content/uploads/2017/12/was-ist-ein-data-scientist-1170x190.png 1170w" sizes="auto, (max-width: 1500px) 100vw, 1500px" /><figcaption id="caption-attachment-4477" class="wp-caption-text">Tasks of the data scientist in a process chain (own illustration)</figcaption></figure></p>
<h2>Conclusion: what does a data scientist do?</h2>
<p>There are numerous terms and answers to the question: What does a data scientist do but in the end an abstract picture emerges. On the one hand, the data scientist is a scientist who solves business problems with the help of data and, on the other hand, he is a project manager who manages the database in the company.<br />
I hope to have shed some light on the darkness with this article and I look forward to your comments on what the data scientists are doing in your environment. I have already worked with a few during my consulting time and can also see what my big data colleagues at the chair are researching. I also learned something from you and one<a href="https://agile-companies.com/tips-on-the-quantitative-survey/" target="_blank" rel="noopener"> quantitative analysis</a> using data used in my doctorate.<br />
<strong>Tip:<a href="https://www.amazon.de/hz/wishlist/ls/W0ILVF5COVHN?&amp;sort=default?&amp;tag=agileunter-21" target="_blank" rel="noopener"> Book suggestions too</a> Big data</strong><br />
[werbung]<br />
<span class="collapseomatic " id="id69f9e846a66ee"  tabindex="0" title="Verwendete Quellen anzeigen"    >Verwendete Quellen anzeigen</span><div id="target-id69f9e846a66ee" class="collapseomatic_content "><br />
Image source:<a href="https://www.freepik.com/free-photo/startup-business-teamwork-meeting-concept_1235178.htm"> Designed by Freepik</a><br />
</div>
<p>[fotolia]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-does-a-data-scientist-do/">What does a data scientist do?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Big Data Opportunities &#8211; Is Data the New Oil?</title>
		<link>https://agile-companies.com/big-data-opportunities/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:27:43 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<guid isPermaLink="false">https://agile-companies.com/big-data-opportunities-is-data-the-new-oil/</guid>

					<description><![CDATA[<p>Whether we drive a car, surf the Internet, take photos or videos with our smartphones or operate machines at work: data is generated. The amount is so gigantic that experts speak of &#8220;big data&#8221; (source: North Bavarian courier ). The chances of Big Data are great and it is certainly also a hype, but it [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Whether we drive a car, surf the Internet, take photos or videos with our smartphones or operate machines at work: data is generated. The amount is so gigantic that experts speak of &#8220;big data&#8221; (source:<a href="http://www.nordbayerischer-kurier.de/nachrichten/chancen-und-risiken-von-big-data_526872" target="_blank" rel="noopener"> North Bavarian courier</a> ). The chances of Big Data are great and it is certainly also a hype, but it opens up almost infinite possibilities. This article aims to take a closer look at the opportunities offered by big data.<br />
<strong>Reading tip:<a href="https://agile-companies.com/what-is-big-data-definition/" target="_blank" rel="noopener"> What is big data</a></strong></p>
<h2>Data is the new oil &#8211; big data opportunities</h2>
<p>&#8220;The collection and analysis of data nowadays corresponds to the same principle as it did 100 years ago, during the oil boom. You try to tap into as many sources as possible in order to make a profit.&#8221; (Source<a href="https://www.zensations.at/de/blog/chancen-und-risiken-von-big-data" target="_blank" rel="noopener"> Censation</a> ). The magazine Google and Facebook also give examples of which companies buy just to get their data. &#8220;We are trying to evaluate a large number of touchpoints and use the data obtained from them to create prognoses for action and consequently to derive new products and services. &#8221; (Source<a href="https://www.zensations.at/de/blog/chancen-und-risiken-von-big-data" target="_blank" rel="noopener"> Censation</a> ).</p>
<h2>Addressing customers</h2>
<p>Big data makes it possible to understand customers better and then to address them in a more targeted manner. Customers are now on the move on smartphones and tablets. Mobile and at any time. Traditional messages are no longer up to date. A customer on the smartphone (possibly on the train) must be addressed differently than a customer on the PC at home at his desk.<br />
Data can help to guarantee the precision and the addressing of customers in real time. Knowledge of the purchase history can also bring significant added value. These are all data that are often already in the company but are currently not used to address customers.</p>
<h2>Forecasts</h2>
<p>Big data can be used to make more accurate forecasts. Of course, reports have been created for a long time and decisions are made based on them. But using new methods like<a href="https://agile-companies.com/evaluation-and-find-experts/" target="_blank" rel="noopener"> qualitative evaluation</a> direct knowledge can now be obtained from this report.<br />
With the help of these forecasts, more informed decisions can be made and, coupled with a pinch of gut feeling and some experience, this decision can often be very good. So it can be a good addition. In particular, the connection between variables can be useful for a decision: Do customers buy more who order in the morning or who came via a landing page? These are connections that can perhaps only be revealed by a big data tool.</p>
<h2>development</h2>
<p>Big data can be used to create evaluations, the results of which flow into the development of products and services. For example, vehicles now generate a lot of data that improve the driving experience, or Spotify recommends the music you want to listen to. Spotify checks whether you prefer to listen to quiet music in the evening and rock&#8217;n&#8217;roll in the morning to wake up.<br />
With the help of these examples, a product can be improved, but also completely new products can be created. In this way, apps or products can be completely personalized. But new products such as data analytics tools or distributed data storage systems are also coming onto the market. The job of data analyst is now also sought and if you look at the job profiles, from the Excel table creator to the server administrator to the software developer, almost everything is included.</p>
<h2>Conclusion</h2>
<p>&#8220;Business e-mails, networked company vehicles, incoming orders in online stores, keyword scanning in social media groups, the acquisition of location-based data, video recordings from surveillance cameras, measured values from machines &#8211; countless data is generated every second in a company.&#8221; (Source<a href="https://www.t-systems.com/de/de/loesungen/digitalisierung/digitialisierung-themen/big-data/advanced-analytics-238656" target="_blank" rel="noopener"> T-Systems</a> ).<br />
<strong>Tip:<a href="https://www.amazon.de/hz/wishlist/ls/W0ILVF5COVHN?&amp;sort=default?&amp;tag=agileunter-21" target="_blank" rel="noopener"> Book suggestions too</a> Big data</strong><br />
This potential can be used on the one hand to address customers more specifically, to make forecasts for well-founded decisions and to develop new products. All of these are opportunities for big data and enable companies to stand out from the market.<br />
<strong>Reading tip:<a href="https://agile-companies.com/big-data-risks/" target="_blank" rel="noopener"> Big data risks</a></strong><br />
Big data therefore offers great potential for companies, but of course also harbors risks such as bad investments or misuse. In order to read the other side as well, I therefore recommend taking a look at my follow-up article.</p>
<p><figure id="attachment_4498" aria-describedby="caption-attachment-4498" style="width: 431px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="wp-image-4498" src="https://agile-unternehmen.de/wp-content/uploads/2017/11/big-data-chancen-risiken.png" alt="Big Data Chancen Risiken" width="431" height="254" srcset="https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken.png 1088w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-300x177.png 300w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-1024x603.png 1024w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-768x452.png 768w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-175x103.png 175w, https://agile-companies.com/wp-content/uploads/2017/11/big-data-chancen-risiken-450x265.png 450w" sizes="auto, (max-width: 431px) 100vw, 431px" /><figcaption id="caption-attachment-4498" class="wp-caption-text">Opportunities and risks of big data (own illustration)</figcaption></figure></p>
<p>[werbung]<br />
<span class="collapseomatic " id="id69f9e846a734b"  tabindex="0" title="Verwendete Quellen anzeigen"    >Verwendete Quellen anzeigen</span><div id="target-id69f9e846a734b" class="collapseomatic_content "><br />
Image source:<a href="https://www.freepik.com/free-photo/people-financial-results-analyzing-statistics_1145745.htm"> Designed by Freepik</a><br />
</div>
<p>[fotolia]</p>
<p>Der Beitrag <a href="https://agile-companies.com/big-data-opportunities/">Big Data Opportunities &#8211; Is Data the New Oil?</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What is big data definition</title>
		<link>https://agile-companies.com/what-is-big-data-definition/</link>
		
		<dc:creator><![CDATA[Dr. Dominic Lindner]]></dc:creator>
		<pubDate>Sun, 21 Mar 2021 17:27:39 +0000</pubDate>
				<category><![CDATA[Big data]]></category>
		<category><![CDATA[Digital change]]></category>
		<category><![CDATA[digital transformation]]></category>
		<category><![CDATA[Medium-sized companies - Consulting]]></category>
		<guid isPermaLink="false">https://agile-companies.com/what-is-big-data-definition/</guid>

					<description><![CDATA[<p>The mountain of data available on the Internet and in companies &#8211; this fact is known as big data &#8211; is getting bigger, more confusing and difficult to process. Ever more technologically sophisticated tools and programs are intended to tame the flood of data (source Big data insider ). The flood of data is getting [...]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-is-big-data-definition/">What is big data definition</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>The mountain of data available on the Internet and in companies &#8211; this fact is known as big data &#8211; is getting bigger, more confusing and difficult to process. Ever more technologically sophisticated tools and programs are intended to tame the flood of data (source<a href="http://www.bigdata-insider.de/was-ist-big-data-a-562440/" target="_blank" rel="noopener"> Big data insider</a> ).<br />
The flood of data is getting bigger and bigger and presents companies with the challenge of saving, preserving and, above all, evaluating them. That&#8217;s what it says<a href="http://www.bigdata-insider.de/was-ist-big-data-a-562440/" target="_blank" rel="noopener"> Big Data Insider magazine</a> continue:<em> On the one hand, he describes the increasingly rapidly growing amounts of data; On the other hand, however, it is also about new and explicitly powerful IT solutions and systems with which companies can advantageously process the flood of information.</em><br />
But many readers still wonder: what is big data? Is it just having a large Excel or a large amount of paper documents? Is that big data or if not: what is big data? I would like to get to the bottom of this question in the article.</p>
<h2>What is big data</h2>
<p>If you enter the search term on Google, you will find the following definition of<a href="http://www.searchenterprisesoftware.de/definition/Big-Data" target="_blank" rel="noopener"> ReserachEnterpriseSoftware</a> :<b> Big data</b> is a general term used to describe the large amounts of unstructured and semi-structured data that companies produce on a daily basis. This data takes a lot of time and money to load into a relational database for analysis.<br />
That too<a href="http://wirtschaftslexikon.gabler.de/Definition/big-data.html" target="_blank" rel="noopener"> Gabler Business Lexicon</a> provides a definition: &#8220;Big Data&#8221; refers to large amounts of data from areas such as the Internet and mobile communications, the financial industry, energy industry, health care and transport and from sources such as intelligent agents, social media, credit and customer cards, and smart metering -Systems, assistance devices, surveillance cameras as well as aircraft and vehicles originate and which are stored, processed and evaluated with special solutions.<br />
A great explanation can also be found at<a href="http://praxistipps.chip.de/was-ist-big-data-einfach-erklaert_41589" target="_blank" rel="noopener"> chip</a> :</p>
<ul class="List List-Divider List--Unordered bullet">
<li>A large amount of data is referred to as big data if the volume is too large or too<span class="vm-hook-outer vm-hook-default"><span class="vm-hook"> complex</span></span> is to process them by hand. This is especially true for data that is constantly changing.</li>
<li>Big data can be harmless data from climate research. However, data about people are also collected: communication behavior, consumer behavior or surfing behavior of Internet users. You can see the effects of big data analysis every day<span class="vm-hook-outer vm-hook-default"><span class="vm-hook"> on the Internet</span></span> . A typical example is personalized advertising.</li>
</ul>
<p>In conclusion, one can say that big data is initially large amounts of data that are unstructured and cannot be evaluated by hand.</p>
<h2>Big data &#8211; pros and cons</h2>
<p>Large amounts of data offer our society a number of new possibilities, at least if we are able to evaluate them. So I have at<a href="http://de.dice.com/nachrichten/5-beispiele-fuer-die-nutzung-von-big-data/" target="_blank" rel="noopener"> dice</a> found some great examples which I would like to show.<br />
Crimes can also be fought better on the one hand. As Dice says, it may not be quite like the movie Minority Report, which shows a society where police officers successfully arrest individuals for crimes they did not commit. But in fact, massive amounts of data are already helping regional and local authorities identify difficulties before they can become a major problem.<br />
Furthermore, diseases can be predicted. As dice says: Predicting events based on existing data could offer individual medical care for each patient. By analyzing digitally recorded medical data and similar disease courses from patients, personal disease risk profiles can be created. Doctors could then prescribe preventative treatments or review related symptoms.<br />
Another great example is Netflix. At dice, for example, the following can be found: For Netflix, the visualization of data is of the utmost importance in order to be able to continue its success story. Netflix can use data calculations to determine what viewers want to see and how they would like the content to be presented.<br />
So there are numerous great examples. For more you can just click<a href="http://de.dice.com/nachrichten/5-beispiele-fuer-die-nutzung-von-big-data/" target="_blank" rel="noopener"> dice</a> Continue reading. In the following, however, big data may not always be a success factor, which is why I would also like to present the downsides.<br />
So says the magazine CIO, the criticism of big data &#8220;<em>is probably due to the &#8220;algorithm weakness&#8221; widespread among machines, i.e. the inability to draw the right conclusions from a lot of collected information. That being said, there are two main reasons why companies do not benefit, or not enough, from big data. The first: With the help of data analysis, you will come to results that you could have had with less big data. &#8220;</em><br />
<em>The second: Big Data produces results and ideas that, for whatever reason, cannot be implemented in practice. A large US retailer had found in a model test that sales increase if you put a special offer product on the shelves a while before it is cheaper and leave it there when the offer price is no longer valid</em> (<a href="https://www.cio.de/a/warum-big-data-oft-nutzlos-ist,3103705" target="_blank" rel="noopener"> CIO</a> ).<br />
In this paragraph we have now seen some good examples of big data, but also some that may not be quite as meaningful. So, as always, it is up to us that we use big data correctly and not just do it because it is just hype.</p>
<h2>AI, deep learning and machine learning</h2>
<p>One potential that the mastery of large amounts of data brings with it is the ability to give machines an intelligence, i.e. artificial intelligence. A machine could evaluate the controlling report itself and initiate measures, right?<br />
On the one hand, you can find the term artificial intelligence. So says<a href="http://t3n.de/news/ai-machine-learning-nlp-deep-learning-776907/" target="_blank" rel="noopener"> t3n</a> :<em> All technologies used in connection with the provision of intelligence services that were previously reserved for humans can be found under the generic term AI.</em><br />
There is also the possibility of machine learning: machine learning describes mathematical techniques that enable a system, i.e. a machine, to independently generate knowledge from experience (<a href="http://t3n.de/news/ai-machine-learning-nlp-deep-learning-776907/" target="_blank" rel="noopener"> t3n</a> ). However, deep learning goes one step further:<strong> “Deep Learning” with artificial neural networks</strong> is a particularly efficient method of permanent machine learning based on statistical analysis of large amounts of data (big data) and the most important future technology within AI (<a href="http://t3n.de/news/ai-machine-learning-nlp-deep-learning-776907/" target="_blank" rel="noopener"> t3n</a> ).</p>
<h2>Conclusion: what is big data?</h2>
<p>Big data is a lot of unstructured data that we cannot evaluate by hand. Mastering such amounts of data can lead to great potential, but it can also &#8220;backfire&#8221;. The potential of this data mastery are new methods such as artificial intelligence, which can offer an enormous market advantage. I hope to have given an insight into the topic with this article and I am looking forward to my first article on this topic myself.</p>
<h2>LEARN MORE ABOUT: What is Big Data</h2>
<p>If you would like to find out more about the question: What is Big Data, please contact the<a href="https://agile-companies.com/digitization-of-work-legal-framework/"> Round tables</a> participate and discuss relevant topics with myself and other experts. Or write in the comments how you are dealing with this trend.<a href="https://agile-companies.com/seedingup-influencer-marketing-for-companies/" target="_blank" rel="noopener"> Also read my article</a> to big data.<br />
<strong>Tip:<a href="https://www.amazon.de/hz/wishlist/ls/W0ILVF5COVHN?&amp;sort=default?&amp;tag=agileunter-21" target="_blank" rel="noopener"> Book suggestions too</a> Big data</strong><br />
[werbung]<br />
<span class="collapseomatic " id="id69f9e846a7d57"  tabindex="0" title="Verwendete Quellen anzeigen"    >Verwendete Quellen anzeigen</span><div id="target-id69f9e846a7d57" class="collapseomatic_content "><br />
Image source:<a href="http://www.freepik.com/free-photo/mountain-books_908562.htm"> Designed by Freepik</a><br />
</div>
<p>[fotolia]</p>
<p>Der Beitrag <a href="https://agile-companies.com/what-is-big-data-definition/">What is big data definition</a> erschien zuerst auf <a href="https://agile-companies.com">agile Companies</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
