JuTT Developer Series

Python for Programmers

JuTT BaDshaH

JuTT BaDshaH®
Playlists
History
Topics
Learning Paths
Offers & Deals
Highlights
Settings
Support
Sign Out

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.

The authors and publisher have taken care in the preparation of this book, but make no

expressed or implied warranty of any kind and assume no responsibility for errors or

omissions. No liability is assumed for incidental or consequential damages in

connection with or arising out of the use of the information or programs contained

herein.

For information about buying this title in bulk quantities, or for special sales

opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding

interests), please contact me at juttbadshah1120@gmail.com or Whatsapp.

All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms, and the appropriate contacts within the Pearson Education Global Rights & Permissions Department, please visit

www.pearsoned.com/permissions/.

JuTT BaDshaH and the doublethumbsup bug are registered trademarks of JuTT BaDshaH and Associates, Inc.

Python logo courtesy of the Programming King.

Cover design by JuTT BaDshaH,

Cover art by Juttbadshah/Shutterstock

ISBN13: 9780135224335

ISBN10: 0135224330

1 19

Reface

“There’s gold in them thar hills!”

Welcome to Python for Programmers! In this Course, you’ll learn handson with today’s most compelling, leadingedge computing technologies, and you’ll program in Python—one of the world’s most popular languages and the fastest growing among them.Developers often quickly discover that they like Python. They appreciate its expressive power, readability, conciseness and interactivity. They like the world of opensource software development that’s generating a rapidly growing base of reusable software for an enormous range of application areas.For many decades, some powerful trends have been in place. Computer hardware has rapidly

been getting faster, cheaper and smaller. Internet bandwidth has rapidly been getting larger and cheaper. And quality computer software has become ever more abundant and essentially free or nearly free through the “open source” movement. Soon, the “Internet of Things” will connect tens of billions of devices of every imaginable type. These will generate enormous volumes of data at rapidly increasing speeds and quantities.In computing today, the latest innovations are “all about the data”—data science, data

analytics, big data, relational databases (SQL), and NoSQL and NewSQL databases, each of which we address along with an innovative treatment of Python programming.

JOBS REQUIRING DATA SCIENCE SKILLS

In 2011, McKinsey Global Institute produced their report, “Big data: The next frontier for innovation, competition and productivity.” In it, they said, “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” This continues to be the case. The August 2018 “LinkedIn Workforce Report” says the United States has a shortage of over 150,000 people with data science skills. A 2017 report from IBM, Burning Glass Technologies and the BusinessHigher Education Forum, says that by 2020 in the United States there will be hundreds of thousands of new jobs requiring data science skills.

https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_full_report.ashx.
https://economicgraph.linkedin.com/resources/linkedinworkforce-reportaugust2018.
https://www.burningglass.com/wp-content/uploads/The_Quant_Crunch.pdf.

MODULAR ARCHITECTURE

The book’s modular architecture (please see the Table of Contents graphic on the

book’s inside front cover) helps us meet the diverse needs of various professional audiences.

Chapters 1–10 cover Python programming. These chapters each include a brief Intro to

Data Science section introducing artificial intelligence, basic descriptive statistics,

measures of central tendency and dispersion, simulation, static and dynamic visualization, working with CSV files, pandas for data exploration and data wrangling, time series and imple linear regression. These help you prepare for the data science, AI, big data and cloud

case studies in

Chapters 11–16, which present opportunities for you to use realworld

datasets in complete case studies.

After covering Python

Chapters 1– 5 and a few key parts of

Chapters 6– 7 , you’ll be able to handle significant portions of the case studies in

Chapters 11–16. The “Chapter Dependencies” section of this Preface will help trainers plan their professional courses in the context of the book’s unique architecture.

Chapters 11–16 are loaded with cool, powerful, contemporary examples. They present hands-on implementation case studies on topics such as natural language processing, data mining Twitter, cognitive computing with IBM’s Watson, supervised machine learning with classification and regression, unsupervised machine learning with clustering, deep learning with convolutional neural networks, deep learning with recurrent neural networks, big data with Hadoop, Spark and NoSQL

databases, the Internet of Things and more. Along the way, you’ll acquire a broad

literacy of data science terms and concepts, ranging from brief definitions to using concepts in small, medium and large programs. Browsing the book’s detailed Table of Contents and Index will give you a sense of the breadth of coverage.

KEY FEATURES

KIS (Keep It Simple), KIS (Keep it Small), KIT (Keep it Topical)

Keep it simple—In every aspect of the book, we strive for simplicity and clarity. For example, when we present natural language processing, we use the simple and intuitive TextBlob library rather than the more complex NLTK. In our deep learning presentation, we prefer Keras to TensorFlow. In general, when multiple libraries could be used to perform similar tasks, we use the simplest one.Keep it small—Most of the book’s 538 examples are small—often just a few lines of code, with immediate interactive IPython feedback. We also include 40 larger scripts and indepth case studies.Keep it topical—We read scores of recent Pythonprogramming and data science books, and browsed, read or watched about 15,000 current articles, research papers, white papers, videos, blog posts, forum posts and documentation pieces. This enabled us to “take the pulse” of the Python, computer science, data science, AI, big data and cloud communities.

Immediate-Feedback: Exploring, Discovering and Experimenting with IPython

The ideal way to learn from this book is to read it and run the code examples in parallel.Throughout the book, we use the IPython interpreter, which provides a friendly, immediatefeedback interactive mode for quickly exploring, discovering and experimenting with Python and its extensive libraries. Most of the code is presented in small, interactive IPython sessions. For each code snippet you write, IPython immediately reads it, evaluates it and prints the results. This instant feedback keeps your attention, boosts learning, facilitates rapid prototyping and speeds the softwaredevelopment process.Our books always emphasize the livecode approach, focusing on complete, working programs with live inputs and outputs. IPython’s “magic” is that it turns even snippets into code that “comes alive” as you enter each line. This promotes learning and encourages experimentation.

Python Programming Fundamentals

First and foremost, this book provides rich Python coverage.We discuss Python’s programming models—procedural programming, functional tyle programming and objectoriented programming. We use best practices, emphasizing current idiom.

Functionalstyle programming is used throughout the book as appropriate. A chart in

Chapter 4 lists most of Python’s key functionalstyle programming capabilities and the chapters in which we initially cover most of them.

538 Code Examples

You’ll get an engaging, challenging and entertaining introduction to Python with 538 realworld examples ranging from individual snippets to substantial computer

science, data science, artificial intelligence and big data case studies. You’ll attack significant tasks with AI, big data and cloud technologies like natural language processing, data mining Twitter, machine learning, deep learning, Hadoop, MapReduce, Spark, IBM Watson, key data science libraries (NumPy, pandas, SciPy, NLTK, TextBlob, spaCy, Textatistic, Tweepy, Scikitlearn, Keras), key visualization libraries (Matplotlib, Seaborn, Folium) and more.

Avoid Heavy Math in Favor of English Explanations

We capture the conceptual essence of the mathematics and put it to work in our examples. We do this by using libraries such as statistics, NumPy, SciPy, pandas and many others, which hide the mathematical complexity. So, it’s straightforward for you to get many of the benefits of mathematical techniques like linear regression without having to know the mathematics behind them. In the machinelearning and deep-learning examples, we focus on creating objects that do the math for you “behind the scenes.”

Visualizations

67 static, dynamic, animated and interactive visualizations (charts, graphs, pictures, animations etc.) help you understand concepts. Rather than including a treatment of lowlevel graphics programming, we focus on high-level visualizations produced by Matplotlib, Seaborn, pandas and Folium (for interactive maps). We use visualizations as a pedagogic tool. For example, we make the law of large numbers “come alive” in a dynamic dierolling simulation and bar chart. As the

number of rolls increases, you’ll see each face’s percentage of the total rolls gradually

approach 16.667% (1/6th) and the sizes of the bars representing the percentages equalize. Visualizations are crucial in big data for data exploration and communicating reproducible research results, where the data items can number in the millions, billions or more. A common saying is that a picture is worth a thousand words —in big data, a visualization could be worth billions, trillions or even more items in a database. Visualizations enable you to “fly 40,000 feet above the data” to see it “in the large” and to get to know your data. Descriptive statistics help but can be misleading. For example, Anscombe’s quartet demonstrates through visualizations that significantly dif erent

datasets can have nearly identical descriptive statistics.

We show the visualization and animation code so you can implement your own. We also provide the animations in source code files and as Jupyter Notebooks, so you can conveniently customize the code and animation parameters, reexecute the animations and see the effects of the changes.

Data Experiences

Our Intro to Data Science sections and case studies in

Chapters 11–16 provide rich

data experiences.You’ll work with many realworld datasets and data sources. There’s an enormous variety of free open datasets available online for you to experiment with. Some of the sites we reference list hundreds or thousands of datasets. Many libraries you’ll use come bundled with popular datasets for experimentation. You’ll learn the steps required to obtain data and prepare it for analysis, analyze that data using many techniques, tune your models and communicate your results effectively,

especially through visualization.

GitHub

GitHub is an excellent venue for finding opensource code to incorporate into your

projects (and to contribute your code to the opensource community). It’s also a crucial

element of the software developer’s arsenal with version control tools that help teams of developers manage opensource (and private) projects. You’ll use an extraordinary range of free and opensource Python and data science libraries, and free, freetrial and freemium offerings of software and cloud services. Many of the libraries are hosted on GitHub.

Hands-On Cloud Computing

Much of big data analytics occurs in the cloud, where it’s easy to scale dynamically the amount of hardware and software your applications need. You’ll work with various cloud-based services (some directly and some indirectly), including Twitter, Google

Translate, IBM Watson, Microsoft Azure, OpenMapQuest, geopy, Dweet.io and PubNub. We encourage you to use free, free trial or freemium cloud services. We prefer those that don’t require a credit card because you don’t want to risk accidentally running up big bills. If you decide to use a service that requires a credit card, ensure that the tier you’re using for free will not automatically jump to a paid tier.

Database, Big Data and Big Data Infrastructure

According to IBM (Nov. 2016), 90% of the world’s data was created in the last two years. Evidence indicates that the speed of data creation is rapidly accelerating.

https://public.dhe.ibm.com/common/ssi/ecm/wr/en/wrl12345usen/watson-customerengagementwatsonmarketingwrotherpapersandreports-wrl12345usen20170719.pdf.

According to a March 2016 AnalyticsWeek article, within five years there will be over 50 billion devices connected to the Internet and by 2020 we’ll be producing 1.7 megabytes of new data every second for every person on the planet!

https://analyticsweek.com/content/bigdatafacts/.

We include a treatment of relational databases and SQL with SQLite. Databases are critical big data infrastructure for storing and manipulating the massive amounts of data you’ll process. Relational databases process structured data—they’re not geared to the unstructured and semistructured data in big data applications.

So, as big data evolved, NoSQL and NewSQL databases were created to handle such data efficiently. We include a NoSQL and NewSQL overview and a handson case study with a MongoDB JSON document database. MongoDB is the most popular NoSQL database. We discuss big data hardware and software infrastructure in

Chapter 16, “ Big data: Hadoop, Spark, NoSQL and IoT (Internet of Things).”

Artificial Intelligence Case Studies

In case study

Chapters 11–15, we present artificial intelligence topics, including natural language processing, data mining Twitter to perform sentiment analysis, cognitive computing with IBM Watson, supervised machine learning, unsupervised machine learning and deep learning.

Chapter 16 presents the big data hardware and software infrastructure that enables computer scientists and data scientists to implement leadingedge AIbased solutions.

Built-In Collections: Lists, Tuples, Sets, Dictionaries

There’s little reason today for most application developers to build custom data

structures. The book features a rich twochapter treatment of Python’s builtin

data structures—lists, tuples, dictionaries and sets—with which most data-

structuring tasks can be accomplished.

Array-Oriented Programming with NumPy Arrays and Pandas Series/DataFrames

We also focus on three key data structures from opensource libraries—NumPy arrays,

pandas Series and pandas DataFrames. These are used extensively in data science,

computer science, artificial intelligence and big data. NumPy offers as much as two orders of magnitude higher performance than builtin Python lists. We include in

Chapter 7 a rich treatment of NumPy arrays. Many libraries, such as pandas, are built on NumPy. The Intro to Data Science sections in

Chapters 7– 9 introduce pandas Series and DataFrames, which along with NumPy arrays are then used throughout the remaining chapters.

File Processing and Serialization

Chapter 9 presents textfile processing, then demonstrates how to serialize objects using the popular JSON (JavaScript Object Notation) format. JSON is used frequently in the data science chapters. Many data science libraries provide builtin fileprocessing capabilities for loading datasets into your Python programs. In addition to plain text files, we process files in the popular CSV (commaseparated values) format using the Python Standard Library’s csv module and capabilities of the pandas data science library.

Object-Based Programming

We emphasize using the huge number of valuable classes that the Python opensource

community has packaged into industry standard class libraries. You’ll focus on knowing what libraries are out there, choosing the ones you’ll need for your apps, creating objects from existing classes (usually in one or two lines of code) and making them “jump, dance and sing.” This objectbased programming enables you to build impressive applications quickly and concisely, which is a significant part of Python’s appeal. With this approach, you’ll be able to use machine learning, deep learning and other AI technologies to quickly solve a wide range of intriguing problems, including cognitive computing challenges like speech recognition and computer vision.

Object-Oriented Programming

Developing custom classes is a crucial objectoriented programming skill, along

with inheritance, polymorphism and duck typing. We discuss these in Chapter 10.

Chapter 10 includes a discussion of unit testing with doctest and a fun card-

shuffling-and-dealing simulation.

Chapters 11–16 require only a few straightforward custom class definitions. In Python, you’ll probably use more of an objectbased programming approach than fullout object-oriented programming.

Reproducibility

In the sciences in general, and data science in particular, there’s a need to reproduce the results of experiments and studies, and to communicate those results effectively. Jupyter Notebooks are a preferred means for doing this. We discuss reproducibility throughout the book in the context of programming techniques and software such as Jupyter Notebooks and Docker.

Performance

We use the %timeit profiling tool in several examples to compare the performance of

different approaches to performing the same tasks. Other performancerelated

discussions include generator expressions, NumPy arrays vs. Python lists, performance of machinelearning and deeplearning models, and Hadoop and Spark distributed-

computing performance.

Big Data and Parallelism

In this book, rather than writing your own parallelization code, you’ll let libraries like

Keras running over TensorFlow, and big data tools like Hadoop and Spark parallelize

operations for you. In this big data/AI era, the sheer processing requirements of massive data applications demand taking advantage of true parallelism provided by multicore processors, graphics processing units (GPUs), tensor processing units (TPUs)

and huge clusters of computers in the cloud. Some big data tasks could have

thousands of processors working in parallel to analyze massive amounts of data

expeditiously.

CHAPTER DEPENDENCIES

If you’re a trainer planning your syllabus for a professional training course or a developer deciding which chapters to read, this section will help you make the best decisions. Please read the onepage color Table of Contents on the book’s inside front cover—this will quickly familiarize you with the book’s unique architecture. Teaching or reading the chapters in order is easiest. However, much of the content in the Intro to Data Science sections at the ends of

Chapters 1–10 and the case studies in

Chapters 11–16 requires only

Chapters 1– 5 and small portions of

Chapters 6–10 as discussed below.

Part 1: Python Fundamentals Quickstart

We recommend that you read all the chapters in order:

Chapter 1, Introduction to Computers and Python, introduces concepts that lay the groundwork for the Python programming in

Chapters 2–10 and the big data,artificialintelligence and cloudbased case studies in

Chapters 11–16. The chapter also includes testdrives of the IPython interpreter and Jupyter Notebooks.

Chapter 2, Introduction to Python Programming, presents Python programming fundamentals with code examples illustrating key language features.

Chapter 3, Control Statements, presents Python’s control statements and introduces basic list processing.

Chapter 4, Functions, introduces custom functions, presents simulation techniques with randomnumber generation and introduces tuple fundamentals.

Chapter 5, Sequences: Lists and Tuples, presents Python’s builtin list and tuple collections in more detail and begins introducing functionalstyle programming.

Part 2: Python Data Structures, Strings and Files

The following summarizes inter chapter dependencies for Python Chapters 6– 9 and assumes that you’ve read Chapters 1– 5 .

Chapter 6, Dictionaries and Sets—The Intro to Data Science section in this chapter is not dependent on the chapter’s contents.

Chapter 7, ArrayOriented Programming with NumPy—The Intro to Data Science section requires dictionaries (Chapter 6) and arrays (Chapter 7).

Chapter 8, Strings: A Deeper Look—The Intro to Data Science section requires raw strings and regular expressions (Sections 8.11–8.12), and pandas Series and DataFrame features from Section 7.14’s Intro to Data Science.

Chapter 9, Files and Exceptions—For JSON serialization, it’s useful to understand dictionary fundamentals (Section 6.2). Also, the Intro to Data Science section requires the builtin open function and the with statement (Section 9.3), and pandas DataFrame features from Section 7.14’s Intro to Data Science.

Part 3: Python High-End Topics

The following summarizes interchapter dependencies for Python

Chapter 10 and assumes that you’ve read Chapters 1– 5 .

Chapter 10, ObjectOriented Programming—The Intro to Data Science section requires pandas DataFrame features from Intro to Data Science Section 7.14. Trainers wanting to cover only classes and objects can present Sections 10.1–10.6. Trainers wanting to cover more advanced topics like inheritance, polymorphism and duck typing, can presentSections 10.7–10.9.Sections 10.10–10.15 provide additional advanced perspectives.

Part 4: AI, Cloud and Big Data Case Studies

The following summary of interchapter dependencies for

Chapters 11–16 assumes that you’ve read
Chapters 1– 5 . Most of
Chapters 11–16 also require dictionary fundamentals from Section 6.2.
Chapter 11, Natural Language Processing (NLP), uses pandas DataFrame features from Section 7.14’s Intro to Data Science.
Chapter 12, Data Mining Twitter, uses pandas DataFrame features from Section 7.14’s Intro to Data Science, string method join (Section 8.9), JSON fundamentals (Section 9.5), TextBlob (Section 11.2) and Word clouds (Section 11.3). Several examples require defining a class via inheritance (Chapter 10).

Chapter 13, IBM Watson and Cognitive Computing, uses builtin function open and the with statement (Section 9.3).
Chapter 14, Machine Learning: Classification, Regression and Clustering, uses NumPy array fundamentals and method unique (Chapter 7), pandas DataFrame features from Section 7.14’s Intro to Data Science and Matplotlib function subplots (Section 10.6).
Chapter 15, Deep Learning, requires NumPy array fundamentals (Chapter 7), string method join (Section 8.9), general machinelearning concepts from
Chapter 14 and features from
Chapter 14’s Case Study: Classification with k Nearest Neighbors and the Digits Dataset.
Chapter 16, Big Data: Hadoop, Spark, NoSQL and IoT, uses string method split(Section 6.2.7), Matplotlib FuncAnimation from Section 6.4’s Intro to Data Science, pandas Series and DataFrame features from Section 7.14’s Intro to Data Science, string method join (Section 8.9), the json module (Section 9.5), NLTK stop words (Section 11.2.13) and from Chapter 12, Twitter authentication, Tweepy’s StreamListener class for streaming tweets, and the geopy and folium libraries. A few examples require defining a class via inheritance (Chapter 10), but you can simply mimic the class definitions we provide without reading Chapter 10.

JUPYTER NOTEBOOKS

For your convenience, we provide the book’s code examples in Python source code (.py) files for use with the commandline IPython interpreter and as Jupyter Notebooks (.ipynb) files that you can load into your web browser and execute. Jupyter Notebooks is a free, opensource project that enables you to combine text, graphics, audio, video, and interactive coding functionality for entering, editing, executing, debugging, and modifying code quickly and conveniently in a web browser. According to the article, “What Is Jupyter?”:

Jupyter has become a standard for scientific research and data analysis. It packages computation and argument together, letting you build “computational narratives”; and it simplifies the problem of distributing working software to teammates and associates.

In our experience, it’s a wonderful learning environment and rapid prototyping tool. For this reason, we use Jupyter Notebooks rather than a traditional IDE, such as Eclipse, Visual Studio, PyCharm or Spyder. Academics and professionals already use Jupyter extensively for sharing research results. Jupyter Notebooks support is provided through the traditional opensource community mechanisms (see “Getting Jupyter Help” later in this Preface). See the Before You Begin section that follows this Preface for software installation details and see the testdrives in Section 1.5 for information on running the book’s examples.

https://jupyter.org/community.

Collaboration and Sharing Results

Working in teams and communicating research results are both important for developers in or moving into dataanalytics positions in industry, government or academia:

The notebooks you create are easy to share among team members simply by copying the files or via GitHub.
Research results, including code and insights, can be shared as static web pages via tools like nbviewer (https://nbviewer.jupyter.org) and GitHub—both automatically render notebooks as web pages.

Reproducibility: A Strong Case for Jupyter Notebooks

In data science, and in the sciences in general, experiments and studies should be reproducible. This has been written about in the literature for many years, including

Donald Knuth’s 1992 computer science publication—Literate Programming.

Knuth, D., “Literate Programming” (PDF), The Computer Journal, British Computer Society, 1992.

The article “LanguageAgnostic Reproducible Data Analysis Using Literate Programming,” which says, “Lir (literate, reproducible computing) is based on the idea of literate programming as proposed by Donald Knuth.”

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164023.

Essentially, reproducibility captures the complete environment used to produce results—hardware, software, communications, algorithms (especially code), data and the data’s rovenance (origin and lineage).

DOCKER

In Chapter 16, we’ll use Docker—a tool for packaging software into containers that bundle everything required to execute that software conveniently, reproducibly and portably across platforms. Some software packages we use in Chapter 16 require complicated setup and configuration. For many of these, you can download free preexisting Docker containers. These enable you to avoid complex installation issues and execute software locally on your desktop or notebook computers, making Docker a great way to help you get started with new technologies quickly and conveniently.

Docker also helps with reproducibility. You can create custom Docker containers that are configured with the versions of every piece of software and every library you used in your study. This would enable other developers to recreate the environment you used, then reproduce your work, and will help you reproduce your own results. In Chapter 16, you’ll use Docker to download and execute a container that’s preconfigured for you to code and run big data Spark applications using Jupyter Notebooks.

SPECIAL FEATURE: IBM WATSON ANALYTICS AND COGNITIVE COMPUTING

Early in our research for this book, we recognized the rapidly growing interest in IBM’s Watson. We investigated competitive services and found Watson’s “no credit card required” policy for its “free tiers” to be among the most friendly for our readers.

IBM Watson is a cognitive-computing platform being employed across a wide range of realworld scenarios. Cognitivecomputing systems simulate the patternrecognition and decision-making capabilities of the human brain to “learn” as they consume more data. We include a significant handson Watson treatment. We use the free Watson Developer Cloud: Python SDK, which provides APIs that enable you to interact with Watson’s services programmatically. Watson is fun to use and a great platform for letting your creative juices flow. You’ll demo or use the following Watson APIs: Conversation, Discovery, Language Translator, Natural Language Classifier, Natural Language Understanding, Personality Insights, Speech to Text, Text to Speech, Tone Analyzer and Visual Recognition.

Watson’s Lite Tier Services and a Cool Watson Case Study

IBM encourages learning and experimentation by providing free lite tiers for many of its APIs. In Chapter 13, you’ll try demos of many Watson services. Then, you’ll use the lite tiers of Watson’s Text to Speech, Speech to Text and Translate services to implement a “traveler’s assistant” translation app. You’ll speak a question in English, then the app will transcribe your speech to English text, translate the text to Spanish and speak the Spanish text. Next, you’ll speak a Spanish response (in case you don’t speak Spanish, we provide an audio file you can use). Then, the app will quickly transcribe the speech to Spanish text, translate the text to English and speak the English response. Cool stuff!

Always check the latest terms on IBM’s website, as the terms and services may change.

https://console.bluemix.net/catalog/.

TEACHING APPROACH

Python for Programmers contains a rich collection of examples drawn from many fields. You’ll work through interesting, realworld examples using real-world datasets. The article concentrates on the principles of good software engineering and stresses program clarity.

Using Fonts for Emphasis

We place the key terms and the index’s page reference for each defining occurrence in bold text for easier reference. We refer to onscreen components in the bold Helvetica font (for example, the File menu) and use the Lucida font for Python code (for example, x = 5).

Syntax Coloring

For readability, we syntax color all the code. Our syntaxcoloring conventions are as follows:

comments appear in green
keywords appear in dark blue
constants and literal values appear in light blue
errors appear in red
all other code appears in black

538 Code Examples

The article’s 538 examples contain approximately 4000 lines of code. This is a relatively small amount for a article this size and is due to the fact that Python is such an expressive language. Also, our coding style is to use powerful class libraries to do most of the work wherever possible.

160 Tables/Illustrations/Visualizations

We include abundant tables, line drawings, and static, dynamic and interactive visualizations.

Programming Wisdom

We integrate into the discussions programming wisdom from the authors’ combined nine decades of programming and teaching experience, including:

Good programming practices and preferred Python idioms that help you produce clearer, more understandable and more maintainable programs.
Common programming errors to reduce the likelihood that you’ll make them.
Error-prevention tips with suggestions for exposing bugs and removing them from your programs. Many of these tips describe techniques for preventing bugs from getting into your programs in the first place.
Performance tips that highlight opportunities to make your programs run faster or minimize the amount of memory they occupy.
Software engineering observations that highlight architectural and design issues for proper software construction, especially for larger systems.

SOFTWARE USED IN THE BOOK

The software we use is available for Windows, macOS and Linux and is free for download from the Internet. We wrote the article’s examples using the free Anaconda Python distribution. It includes most of the Python, visualization and data science libraries you’ll need, as well as the IPython interpreter, Jupyter Notebooks and Spyder, considered one of the best Python data science IDEs. We use only IPython and Jupyter Notebooks for program development in the article. The Before You Begin section following this Preface discusses installing Anaconda and a few other items you’ll need for working with our examples.

PYTHON DOCUMENTATION

You’ll find the following documentation especially helpful as you work through the book:

The Python Language Reference:

https://docs.python.org/3/reference/index.html

The Python Standard Library:

https://docs.python.org/3/library/index.html

Python documentation list:

https://docs.python.org/3/

GETTING YOUR QUESTIONS ANSWERED

Popular Python and general programming online forums include:

Also, many vendors provide forums for their tools and libraries. Many of the libraries you’ll use in this book are managed and maintained at github.com. Some library maintainers provide support through the Issues tab on a given library’s GitHub page. If you cannot find an answer to your questions online, please see our web page for the book at

http://www.deitel.com

Our website is undergoing a major upgrade. If you do not find something you need, please write to us directly at juttbadshah1120@gmail.com.

GETTING JUPYTER HELP

Jupyter Notebooks support is provided through:

Project Jupyter Google Group:

https://groups.google.com/forum/#!forum/jupyter

Jupyter realtime chat room:

https://gitter.im/jupyter/jupyter

GitHub

https://github.com/jupyter/help

StackOverflow:

https://stackoverflow.com/questions/tagged/jupyter

Jupyter for Education Google Group (for instructors teaching with Jupyter):

https://groups.google.com/forum/#!forum/jupytereducation

SUPPLEMENTS

To get the most out of the presentation, you should execute each code example in parallel with reading the corresponding discussion in the book. On the book’s web page at

http://www.deitel.com

we provide:

Downloadable Python source code (.py files) and Jupyter Notebooks (.ipynb files) for the article’s code examples.
Getting Started videos showing how to use the code examples with IPython and Jupyter Notebooks. We also introduce these tools in Section 1.5.
Blog posts and book updates.

For download instructions, see the Before You Begin section that follows this Preface.

Before You Begin

This section contains information you should review before using this article. We’ll post

updates at: http://programming-king-juttbadshah.blogspot.com.

FONT AND NAMING CONVENTIONS

We show Python code and commands and file and folder names in a sansserif

font, and onscreen components, such as menu names, in a bold sansserif font.

We use italics for emphasis and bold occasionally for strong emphasis.

GETTING THE CODE EXAMPLES

You can download the examples.zip file containing the article’s examples from our

Python for Programmers web page at:

http://programming-king-juttbadshah.com

Click the Download Examples link to save the file to your local computer. Most web

browsers place the file in your user account’s Downloads folder. When the download completes, locate it on your system, and extract its examples folder into your user account’s Documents folder:

Windows: C:\Users\YourAccount\Documents\examples
macOS or Linux: ~/Documents/examples

Most operating systems have a builtin extraction tool. You also may use an archive tool such as 7Zip (www.7zip.org) or WinZip (www.winzip.com).

STRUCTURE OF THE EXAMPLES FOLDER

You’ll execute three kinds of examples in this book:

Individual code snippets in the IPython interactive environment.
Complete applications, which are known as scripts.
Jupyter Notebooks—a convenient interactive, webbrowserbased environment in which you can write and execute code and intermix the code with text, images and video.

We demonstrate each in Section 1.5’s test drives.

The examples folder contains one subfolder per chapter. These are named ch##, where ## is the twodigit chapter number 01 to 16—for example, ch01. Except for Chapters 13, 15 and 16, each chapter’s folder contains the following items:

snippets_ipynb—A folder containing the chapter’s Jupyter Notebook files.
snippets_py—A folder containing Python source code files in which each code snippet we present is separated from the next by a blank line. You can copy and paste these snippets into IPython or into new Jupyter Notebooks that you create.
Script files and their supporting files.

Chapter 13 contains one application. Chapters 15 and 16 explain where to find the files you need in the ch15 and ch16 folders, respectively.

INSTALLING ANACONDA

We use the easytoinstall Anaconda Python distribution with this article. It comes with

almost everything you’ll need to work with our examples, including:

the IPython interpreter,
most of the Python and data science libraries we use,
a local Jupyter Notebooks server so you can load and execute our notebooks, and
various other software packages, such as the Spyder Integrated Development Environment (IDE)—we use only IPython and Jupyter Notebooks in this article.

Download the Python 3.x Anaconda installer for Windows, macOS or Linux from:

https://www.anaconda.com/download/

When the download completes, run the installer and follow the onscreen instructions. To ensure that Anaconda runs correctly, do not move its files after you install it.

UPDATING ANACONDA

Next, ensure that Anaconda is up to date. Open a commandline window on your

system as follows:

On macOS, open a Terminal from the Applications folder’s Utilities subfolder.
On Windows, open the Anaconda Prompt from the start menu. When doing this to update Anaconda (as you’ll do here) or to install new packages (discussed momentarily), execute the Anaconda Prompt as an administrator by right-clicking, then selecting More > Run as administrator. (If you cannot find the Anaconda Prompt in the start menu, simply search for it in the Type here to search field at the bottom of your screen.)
On Linux, open your system’s Terminal or shell (this varies by Linux distribution).

In your system’s commandline window, execute the following commands to update

Anaconda’s installed packages to their latest versions:

1. conda update conda

2. conda update --all

PACKAGE MANAGERS

The conda command used above invokes the conda package manager—one of the two key Python package managers you’ll use in this book. The other is pip. Packages contain the files required to install a given Python library or tool. Throughout the book, you’ll use conda to install additional packages, unless those packages are not available through conda, in which case you’ll use pip. Some people prefer to use pip exclusively as it currently supports more packages. If you ever have trouble installing a package with conda, try pip instead.

INSTALLING THE PROSPECTOR STATIC CODE ANALYSIS TOOL

You may want to analyze you Python code using the Prospector analysis tool, which checks your code for common errors and helps you improve it. To install Prospector and the Python libraries it uses, run the following command in the command-line window:

pip install prospector

INSTALLING JUPYTER-MATPLOTLIB

We implement several animations using a visualization library called Matplotlib. To use them in Jupyter Notebooks, you must install a tool called ipympl. In the Terminal, Anaconda Command Prompt or shell you opened previously, execute the following commands one at a time:

https://github.com/matplotlib/jupyter-matplotlib.

conda install c condaforge ipympl

conda install nodejs

jupyter labextension install @jupyterwidgets/jupyterlabmanager

jupyter labextension install jupytermatplotlib

INSTALLING THE OTHER PACKAGES

Anaconda comes with approximately 300 popular Python and data science packages foryou, such as NumPy, Matplotlib, pandas, Regex, BeautifulSoup, requests, Bokeh, SciPy, SciKitLearn, Seaborn, Spacy, sqlite, statsmodels and many more. The number of additional packages you’ll need to install throughout the book will be small and we’ll provide installation instructions as necessary. As you discover new packages, their documentation will explain how to install them.

GET A TWITTER DEVELOPER ACCOUNT

If you intend to use our “Data Mining Twitter” chapter and any Twitterbased examples in subsequent chapters, apply for a Twitter developer account. Twitter now requires registration for access to their APIs. To apply, fill out and submit the application at

https://developer.twitter.com/en/apply-for-access

Twitter reviews every application. At the time of this writing, personal developer accounts were being approved immediately and companyaccount applications were making from several days to several weeks. Approval is not guaranteed.

INTERNET CONNECTION REQUIRED IN SOME CHAPTERS

While using this book, you’ll need an Internet connection to install various additional Python libraries. In some chapters, you’ll register for accounts with cloudbased services, mostly to use their free tiers. Some services require credit cards to verify your identity. In a few cases, you’ll use services that are not free. In these cases, you’ll take advantage of monetary credits provided by the vendors so you can try their services without incurring charges. Caution: Some cloudbased services incur costs once you set them up. When you complete our case studies using such services, be sure to promptly delete the resources you allocated.

SLIGHT DIFFERENCES IN PROGRAM OUTPUTS

When you execute our examples, you might notice some differences between the results we show and your own results:

Due to differences in how calculations are performed with floatingpoint numbers (like –123.45, 7.5 or 0.0236937) across operating systems, you might see minor variations in outputs—especially in digits far to the right of the decimal point.
When we show outputs that appear in separate windows, we crop the windows to remove their borders.

GETTING YOUR QUESTIONS ANSWERED

Online forums enable you to interact with other Python programmers and get your Python questions answered. Popular Python and general programming forums include:

Also, many vendors provide forums for their tools and libraries. Most of the libraries you’ll use in this book are managed and maintained at github.com. Some library maintainers provide support through the Issues tab on a given library’s GitHub page.

If you cannot find an answer to your questions online, please see our web page for the book at

http://programming-king-juttbadshah.blogspot.com

Our website is undergoing a major upgrade. If you do not find something you need, please write to us directly at juttbadshah1120@gmail.com.

You’re now ready to begin reading Python for Programmers. We hope you enjoy the Course!

بلا عنوان

Reface

JOBS REQUIRING DATA SCIENCE SKILLS

KEY FEATURES

KIS (Keep It Simple), KIS (Keep it Small), KIT (Keep it Topical)

Immediate-Feedback: Exploring, Discovering and Experimenting with IPython

Python Programming Fundamentals

538 Code Examples

Avoid Heavy Math in Favor of English Explanations

Visualizations

Data Experiences

GitHub

Hands-On Cloud Computing

Database, Big Data and Big Data Infrastructure

Artificial Intelligence Case Studies

Built-In Collections: Lists, Tuples, Sets, Dictionaries

Array-Oriented Programming with NumPy Arrays and Pandas Series/DataFrames

File Processing and Serialization

Object-Based Programming

Object-Oriented Programming

Reproducibility

Performance

Big Data and Parallelism

CHAPTER DEPENDENCIES

Part 1: Python Fundamentals Quickstart

Part 2: Python Data Structures, Strings and Files

Part 3: Python High-End Topics

Part 4: AI, Cloud and Big Data Case Studies

JUPYTER NOTEBOOKS

Collaboration and Sharing Results

Reproducibility: A Strong Case for Jupyter Notebooks

DOCKER

SPECIAL FEATURE: IBM WATSON ANALYTICS AND COGNITIVE COMPUTING

Watson’s Lite Tier Services and a Cool Watson Case Study

TEACHING APPROACH

Using Fonts for Emphasis

Syntax Coloring

538 Code Examples

160 Tables/Illustrations/Visualizations

Programming Wisdom

SOFTWARE USED IN THE BOOK

PYTHON DOCUMENTATION

GETTING YOUR QUESTIONS ANSWERED

GETTING JUPYTER HELP

SUPPLEMENTS

Before You Begin

FONT AND NAMING CONVENTIONS

GETTING THE CODE EXAMPLES

STRUCTURE OF THE EXAMPLES FOLDER

INSTALLING ANACONDA

UPDATING ANACONDA

PACKAGE MANAGERS

INSTALLING THE PROSPECTOR STATIC CODE ANALYSIS TOOL

INSTALLING JUPYTER-MATPLOTLIB

INSTALLING THE OTHER PACKAGES

GET A TWITTER DEVELOPER ACCOUNT

INTERNET CONNECTION REQUIRED IN SOME CHAPTERS

SLIGHT DIFFERENCES IN PROGRAM OUTPUTS

GETTING YOUR QUESTIONS ANSWERED

نموذج الاتصال