Big Data Analytics
Data analytics is a mature and welldeveloped academic and professional discipline. The term “data analysis” was coined in 1962, though people have been analyzing data using statistics for thousands of years going back to the ancient Egyptians. Big data analytics is a more recent phenomenon—the term “big data” was coined around 2000.
Consider four of the V’s of big data :
There are lots of articles and papers that add many other Vwords to this list.
- Volume—the amount of data the world is producing is growing exponentially.
- Velocity—the speed at which that data is being produced, the speed at which it moves through organizations and the speed at which data changes are growing quickly.
- Variety—data used to be alphanumeric (that is, consisting of alphabetic characters, digits, punctuation and some special characters)—today it also includes images, audios, videos and data from an exploding number of Internet of Things sensors in our homes, businesses, vehicles, cities and more.
- Veracity—the validity of the data—is it complete and accurate? Can we trust that data when making crucial decisions? Is it real?
Most data is now being created digitally in a variety of types, in extraordinary volumes and moving at astonishing velocities. Moore’s Law and related observations have enabled us to store data economically and to process and move it faster—and all at rates growing exponentially over time. Digital data storage has become so vast in capacity, cheap and small that we can now conveniently and economically retain all the digital data we’re creating. That’s big data.
https://www.forbes.com/sites/gilpress/2013/05/28/averyshort-historyofdatascience/.] following Richard W. Hamming quote—although from 1962—sets the tone for the rest of this article:
“The purpose of computing is insight, not numbers.”
https://www.forbes.com/sites/gilpress/2013/05/28/avery-short-history-of-datascience/.]
Data science is producing new, deeper, subtler and more valuable insights at a remarkable pace. It’s truly making a difference. Big data analytics is an integral part of the answer. We address big data infrastructure in Chapter 16 with handson case studies on NoSQL databases, Hadoop MapReduce programming, Spark, realtime Internet of Things (IoT) stream programming and more.
Turck, M., and J. Hao, Great Power, Great Responsibility: The 2018 Big Data & AI Landscape, http://mattturck.com/big-data-2018/
Data Science and Big Data Are Making a Difference: Use Cases
Lewis, M., Moneyball: The Art of Winning an Unfair Game (W. W. Norton & Company, 2004).
Data-science use cases
- anomaly detection
- assisting people with disabilities
- autoinsurance risk prediction
- automated closed captioning
- automated image captions
- automated investing
- autonomous ships
- brain mapping
- caller identification
- cancer diagnosis/treatment
- carbon emissions reduction
- classifying handwriting
- computer vision
- credit scoring
- crime: predicting locations
- facial recognition
- fitness tracking
- fraud detection
- game playing
- genomics and healthcare
- Geographic Information Systems(GIS)
- GPS Systems
- health outcome improvement
- hospital readmission reduction
- human genome sequencing
- identitytheft prevention
- predicting weather-sensitive product sales
- predictive analytics
- preventative medicine
- preventing disease outbreaks
- reading sign language
- real-estate valuation
- recommendation systems
- reducing overbooking
- ride sharing
- risk minimization
- robo financial advisors
- security enhancements
- Crime: predicting recidivism
- crime: predictive policing
- crime: prevention
- CRISPR gene editing
- cropyield improvement
- customer churn
- customer experience
- customer retention
- customer satisfaction
- customer service
- customer service agents
- customized diets
- cybersecurity
- data mining
- data visualization
- detecting new viruses
- diagnosing breast cancer
- diagnosing heart
- disease
- diagnostic medicine
- immunotherapy
- insurance pricing
- intelligent assistants
- Internet of Things (IoT) and medical device monitoring
- Internet of Things and weather forecasting
- inventory control
- language translation
- locationbased services
- loyalty programs
- malware detection
- mapping
- marketing
- marketing analytics
- music generation
- naturallanguage translation
- new pharmaceuticals
- opioid abuse prevention
- personal assistants
- personalized medicine
- personalized shopping
- phishing elimination
- pollution reduction
- precision medicine
- predicting cancer survival
- selfdriving cars
- sentiment analysis
- sharing economy
- similarity detection
- smart cities
- smart homes
- smart meters
- smart thermostats
- smart traffic control
- social analytics
- social graph analysis
- spam detection
- spatial data analysis
- sports recruiting and coaching
- stock market forecasting
- student performance assessment
- summarizing text
- telemedicine
- terrorist attack prevention
- theft prevention
- travel recommendations
- trend spotting
- visual product search
- disaster-victim identification
- drones
- dynamic driving routes
- dynamic pricing
- electronic health records
- emotion detection
- energy-consumption reduction
- predicting disease outbreaks
- predicting health outcomes
- predicting student enrollments
- voice recognition
- voice search
- weather forecasting