Interview Questions
Advanced Big Data Concepts
-
In this final section of the Big Data tutorial, let us walk through some advanced Big Data concepts such as the following,
- Data Flow Computing
- Difference between Control Flow Computing and Data Flow Computing
- Natural Language Processing
- Visualizing Large Data Sets with D3
Data Flow Computing
This is a new paradigm and fresh concept introduced to the Big Data technology.
In a Dataflow computing, the program source is transformed into a Dataflow engine configuration file, which describes the operations, layout, and connections of a Dataflow engine. Data can be streamed from memory into the chip where operations are performed and data is forwarded directly from one computational unit (“dataflow core”) to another, as the results are needed, without being written to the off-chip memory until the chain of processing is complete.
Difference between Control Flow Computing and Data Flow Computing
The above concept in the data flow computing differs from the control flow computing where the program source is transformed into a list of instructions for a particular processor, which is then loaded into the memory attached to the processor. Data and instructions are read from memory into the processor core, where operations are performed and the results are written back to memory. Modern processors contain many levels of caching, forwarding and prediction logic to improve the efficiency of this paradigm; however, the model is inherently sequential with performance limited by the latency of data movement in this loop.
This is a whole paradigm shift in processing Data, Instead of spending time in pushing and pulling data from memory, computing addresses to read, write and synchronizing the threads where 95% of CPU cycle drained. It enables the programmer to write a code that does not control the flow of data but configure computing environment (“programming in space”) so data flows from input port to output port infraction of current processing time. Speed is limited by characteristics of an application while substantial saving in space and power. This is an 180-degree shift from one Big Data Lake concept that we architect now.
The major hurdle is the complexity of programming involved as we personalized each application platform. I am sure millenniums are up for the challenge to process Exabyte of DataSet under few secs!!
Get in touch with training experts Get Free Quotes