Big Data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many entries (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Though used sometimes loosely partly due to a lack of formal definition, the best interpretation is that it is a large body of information that cannot be comprehended when used in small amounts only.
Big data analysis challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy, and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling. Thus a fourth concept, veracity, refers to the quality or insightfulness of the data. Without sufficient investment in expertise for big data veracity, the volume and variety of data can produce costs and risks that exceed an organization’s capacity to create and capture value from big data.
![](https://agrichain.id/wp-content/uploads/2023/12/bd02.jpg )
Definition
![](https://agrichain.id/wp-content/uploads/2023/12/bd03.jpg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd04-1.webp )
Big data vs. business intelligence
![](https://agrichain.id/wp-content/uploads/2023/12/bd05-e1701598280129.webp)
- Business intelligence uses applied mathematics tools and descriptive statistics with data with high information density to measure things, detect trends, etc.
- Big data uses mathematical analysis, optimization, inductive statistics, and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density to reveal relationships and dependencies, or to perform predictions of outcomes and behaviors.
![bd12-e1701602153889.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd12-e1701602153889-150x150.jpg)
![veracity-e1701601185853.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/veracity-e1701601185853-150x150.jpg)
![bd08-e1701600986777.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd08-e1701600986777-150x150.jpg)
![bd07.webp](https://chain.agrindo.net/wp-content/uploads/2023/12/bd07-150x150.webp)
![bd06.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd06-150x150.jpg)
![bd05-e1701598280129.webp](https://chain.agrindo.net/wp-content/uploads/2023/12/bd05-e1701598280129-150x150.webp)
![bd04-1.webp](https://chain.agrindo.net/wp-content/uploads/2023/12/bd04-1-150x150.webp)
![bd03.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd03-150x150.jpg)
![bd02.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd02-150x150.jpg)
![bd01.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd01-150x150.jpg)
Characteristics
![](https://agrichain.id/wp-content/uploads/2023/12/bd07.webp)
![](https://agrichain.id/wp-content/uploads/2023/12/bd08-e1701600986777.jpg)
![](https://chain.agrindo.net/wp-content/uploads/2023/12/veracity-e1701601185853.jpg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd12-e1701602153889.jpg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd10-e1701601715633.webp )
Big Data
globally 2020 – 2030
(zettabytes) global data volume
0
($ Billion) big data market
0
($ Billion) business data analytics
0
Architecture
![](https://agrichain.id/wp-content/uploads/2023/12/bd13.jpeg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd14-e1701604141811.jpg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd15.jpg)
![bd15.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd15-150x150.jpg)
![bd14-e1701604141811.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd14-e1701604141811-150x150.jpg)
![bd13.jpeg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd13-150x150.jpeg)
![bd19.jpeg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd19-150x150.jpeg)
![bd18.jpg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd18-150x150.jpg)
![bd17.jpeg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd17-150x150.jpeg)
![bd16.jpeg](https://chain.agrindo.net/wp-content/uploads/2023/12/bd16-150x150.jpeg)
Technologies
A 2011 McKinsey Global Institute report characterizes the main components and ecosystem of big data as follows:
Multidimensional big data can also be represented as OLAP data cubes or, mathematically, tensors. Array database systems have set out to provide storage and high-level query support on this data type. Additional technologies being applied to big data include efficient tensor-based computation, such as multilinear subspace learning, massively parallel-processing (MPP) databases, search-based applications, data mining, distributed file systems, distributed cache (e.g., burst buffer and Memcached), distributed databases, cloud and HPC-based infrastructure (applications, storage and computing resources), and the Internet. Although, many approaches and technologies have been developed, it still remains difficult to carry out machine learning with big data.
Some MPP relational databases have the ability to store and manage petabytes of data. Implicit is the ability to load, monitor, back up, and optimize the use of the large data tables in the RDBMS.
DARPA’s Topological Data Analysis program seeks the fundamental structure of massive data sets and in 2008 the technology went public with the launch of a company called “Ayasdi”.
The practitioners of big data analytics processes are generally hostile to slower shared storage, preferring direct-attached storage (DAS) in its various forms from solid state drive (SSD) to high capacity SATA disk buried inside parallel processing nodes. The perception of shared storage architectures—storage area network (SAN) and network-attached storage (NAS)— is that they are relatively slow, complex, and expensive. These qualities are not consistent with big data analytics systems that thrive on system performance, commodity infrastructure, and low cost.
Real or near-real-time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in direct-attached memory or disk is good—data on memory or disk at the other end of an FC SAN connection is not. The cost of an SAN at the scale needed for analytics applications is much higher than other storage techniques.
- Techniques for analyzing data, such as A/B testing, machine learning, and natural language processing
- Big data technologies, like business intelligence, cloud computing, and databases
- Visualization, such as charts, graphs, and other displays of the data
![](https://agrichain.id/wp-content/uploads/2023/12/bd16.jpeg)
![](https://agrichain.id/wp-content/uploads/2023/12/bd17.jpeg)
Applications
![](https://agrichain.id/wp-content/uploads/2023/12/bd19.jpeg)