JuTT Developer Series
Python for Programmers
JuTT BaDshaH
- JuTT BaDshaH®
- Playlists
- History
- Topics
- Learning Paths
- Offers & Deals
- Highlights
- Settings
- Support
- Sign Out
Reface
JOBS REQUIRING DATA SCIENCE SKILLS
- https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_full_report.ashx.
- https://economicgraph.linkedin.com/resources/linkedinworkforce-reportaugust2018.
- https://www.burningglass.com/wp-content/uploads/The_Quant_Crunch.pdf.
KEY FEATURES
KIS (Keep It Simple), KIS (Keep it Small), KIT (Keep it Topical)
Immediate-Feedback: Exploring, Discovering and Experimenting with IPython
Python Programming Fundamentals
538 Code Examples
Avoid Heavy Math in Favor of English Explanations
Visualizations
Data Experiences
GitHub
Hands-On Cloud Computing
Database, Big Data and Big Data Infrastructure
Artificial Intelligence Case Studies
Built-In Collections: Lists, Tuples, Sets, Dictionaries
Array-Oriented Programming with NumPy Arrays and Pandas Series/DataFrames
File Processing and Serialization
Object-Based Programming
Object-Oriented Programming
Reproducibility
Performance
Big Data and Parallelism
CHAPTER DEPENDENCIES
If you’re a trainer planning your syllabus for a professional training course or a developer deciding which chapters to read, this section will help you make the best decisions. Please read the onepage color Table of Contents on the book’s inside front cover—this will quickly familiarize you with the book’s unique architecture. Teaching or reading the chapters in order is easiest. However, much of the content in the Intro to Data Science sections at the ends of
Chapters 1–10 and the case studies in
Chapters 11–16 requires only
Chapters 1– 5 and small portions of
Chapters 6–10 as discussed below.
Part 1: Python Fundamentals Quickstart
We recommend that you read all the chapters in order:
Chapter 1, Introduction to Computers and Python, introduces concepts that lay the groundwork for the Python programming in
Chapters 2–10 and the big data,artificialintelligence and cloudbased case studies in
Chapters 11–16. The chapter also includes testdrives of the IPython interpreter and Jupyter Notebooks.
Chapter 2, Introduction to Python Programming, presents Python programming fundamentals with code examples illustrating key language features.
Chapter 3, Control Statements, presents Python’s control statements and introduces basic list processing.
Chapter 4, Functions, introduces custom functions, presents simulation techniques with randomnumber generation and introduces tuple fundamentals.
Chapter 5, Sequences: Lists and Tuples, presents Python’s builtin list and tuple collections in more detail and begins introducing functionalstyle programming.
Part 2: Python Data Structures, Strings and Files
The following summarizes inter chapter dependencies for Python Chapters 6– 9 and assumes that you’ve read Chapters 1– 5 .
Chapter 6, Dictionaries and Sets—The Intro to Data Science section in this chapter is not dependent on the chapter’s contents.
Chapter 7, ArrayOriented Programming with NumPy—The Intro to Data Science section requires dictionaries (Chapter 6) and arrays (Chapter 7).
Chapter 8, Strings: A Deeper Look—The Intro to Data Science section requires raw strings and regular expressions (Sections 8.11–8.12), and pandas Series and DataFrame features from Section 7.14’s Intro to Data Science.
Chapter 9, Files and Exceptions—For JSON serialization, it’s useful to understand dictionary fundamentals (Section 6.2). Also, the Intro to Data Science section requires the builtin open function and the with statement (Section 9.3), and pandas DataFrame features from Section 7.14’s Intro to Data Science.
Part 3: Python High-End Topics
The following summarizes interchapter dependencies for Python
Chapter 10 and assumes that you’ve read Chapters 1– 5 .
Chapter 10, ObjectOriented Programming—The Intro to Data Science section requires pandas DataFrame features from Intro to Data Science Section 7.14. Trainers wanting to cover only classes and objects can present Sections 10.1–10.6. Trainers wanting to cover more advanced topics like inheritance, polymorphism and duck typing, can presentSections 10.7–10.9.Sections 10.10–10.15 provide additional advanced perspectives.
Part 4: AI, Cloud and Big Data Case Studies
The following summary of interchapter dependencies for
- Chapters 11–16 assumes that you’ve read
- Chapters 1– 5 . Most of
- Chapters 11–16 also require dictionary fundamentals from Section 6.2.
- Chapter 11, Natural Language Processing (NLP), uses pandas DataFrame features from Section 7.14’s Intro to Data Science.
- Chapter 12, Data Mining Twitter, uses pandas DataFrame features from Section 7.14’s Intro to Data Science, string method join (Section 8.9), JSON fundamentals (Section 9.5), TextBlob (Section 11.2) and Word clouds (Section 11.3). Several examples require defining a class via inheritance (Chapter 10).
- Chapter 13, IBM Watson and Cognitive Computing, uses builtin function open and the with statement (Section 9.3).
- Chapter 14, Machine Learning: Classification, Regression and Clustering, uses NumPy array fundamentals and method unique (Chapter 7), pandas DataFrame features from Section 7.14’s Intro to Data Science and Matplotlib function subplots (Section 10.6).
- Chapter 15, Deep Learning, requires NumPy array fundamentals (Chapter 7), string method join (Section 8.9), general machinelearning concepts from
- Chapter 14 and features from
- Chapter 14’s Case Study: Classification with k Nearest Neighbors and the Digits Dataset.
- Chapter 16, Big Data: Hadoop, Spark, NoSQL and IoT, uses string method split(Section 6.2.7), Matplotlib FuncAnimation from Section 6.4’s Intro to Data Science, pandas Series and DataFrame features from Section 7.14’s Intro to Data Science, string method join (Section 8.9), the json module (Section 9.5), NLTK stop words (Section 11.2.13) and from Chapter 12, Twitter authentication, Tweepy’s StreamListener class for streaming tweets, and the geopy and folium libraries. A few examples require defining a class via inheritance (Chapter 10), but you can simply mimic the class definitions we provide without reading Chapter 10.
JUPYTER NOTEBOOKS
For your convenience, we provide the book’s code examples in Python source code (.py) files for use with the commandline IPython interpreter and as Jupyter Notebooks (.ipynb) files that you can load into your web browser and execute. Jupyter Notebooks is a free, opensource project that enables you to combine text, graphics, audio, video, and interactive coding functionality for entering, editing, executing, debugging, and modifying code quickly and conveniently in a web browser. According to the article, “What Is Jupyter?”:
Jupyter has become a standard for scientific research and data analysis. It packages computation and argument together, letting you build “computational narratives”; and it simplifies the problem of distributing working software to teammates and associates.
In our experience, it’s a wonderful learning environment and rapid prototyping tool. For this reason, we use Jupyter Notebooks rather than a traditional IDE, such as Eclipse, Visual Studio, PyCharm or Spyder. Academics and professionals already use Jupyter extensively for sharing research results. Jupyter Notebooks support is provided through the traditional opensource community mechanisms (see “Getting Jupyter Help” later in this Preface). See the Before You Begin section that follows this Preface for software installation details and see the testdrives in Section 1.5 for information on running the book’s examples.
https://jupyter.org/community.
Collaboration and Sharing Results
Working in teams and communicating research results are both important for developers in or moving into dataanalytics positions in industry, government or academia:
- The notebooks you create are easy to share among team members simply by copying the files or via GitHub.
- Research results, including code and insights, can be shared as static web pages via tools like nbviewer (https://nbviewer.jupyter.org) and GitHub—both automatically render notebooks as web pages.
Reproducibility: A Strong Case for Jupyter Notebooks
In data science, and in the sciences in general, experiments and studies should be reproducible. This has been written about in the literature for many years, including
- Donald Knuth’s 1992 computer science publication—Literate Programming.
Knuth, D., “Literate Programming” (PDF), The Computer Journal, British Computer Society, 1992.
- The article “LanguageAgnostic Reproducible Data Analysis Using Literate Programming,” which says, “Lir (literate, reproducible computing) is based on the idea of literate programming as proposed by Donald Knuth.”
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164023.
Essentially, reproducibility captures the complete environment used to produce results—hardware, software, communications, algorithms (especially code), data and the data’s rovenance (origin and lineage).
DOCKER
In Chapter 16, we’ll use Docker—a tool for packaging software into containers that bundle everything required to execute that software conveniently, reproducibly and portably across platforms. Some software packages we use in Chapter 16 require complicated setup and configuration. For many of these, you can download free preexisting Docker containers. These enable you to avoid complex installation issues and execute software locally on your desktop or notebook computers, making Docker a great way to help you get started with new technologies quickly and conveniently.
Docker also helps with reproducibility. You can create custom Docker containers that are configured with the versions of every piece of software and every library you used in your study. This would enable other developers to recreate the environment you used, then reproduce your work, and will help you reproduce your own results. In Chapter 16, you’ll use Docker to download and execute a container that’s preconfigured for you to code and run big data Spark applications using Jupyter Notebooks.
SPECIAL FEATURE: IBM WATSON ANALYTICS AND COGNITIVE COMPUTING
Early in our research for this book, we recognized the rapidly growing interest in IBM’s Watson. We investigated competitive services and found Watson’s “no credit card required” policy for its “free tiers” to be among the most friendly for our readers.
IBM Watson is a cognitive-computing platform being employed across a wide range of realworld scenarios. Cognitivecomputing systems simulate the patternrecognition and decision-making capabilities of the human brain to “learn” as they consume more data. We include a significant handson Watson treatment. We use the free Watson Developer Cloud: Python SDK, which provides APIs that enable you to interact with Watson’s services programmatically. Watson is fun to use and a great platform for letting your creative juices flow. You’ll demo or use the following Watson APIs: Conversation, Discovery, Language Translator, Natural Language Classifier, Natural Language Understanding, Personality Insights, Speech to Text, Text to Speech, Tone Analyzer and Visual Recognition.
- http://whatis.techtarget.com/definition/cognitivecomputing.
- https://en.wikipedia.org/wiki/Cognitive_computing.
- https://www.forbes.com/sites/bernardmarr/2016/03/23/whateveryone-shouldknowaboutcognitivecomputing.
Watson’s Lite Tier Services and a Cool Watson Case Study
IBM encourages learning and experimentation by providing free lite tiers for many of its APIs. In Chapter 13, you’ll try demos of many Watson services. Then, you’ll use the lite tiers of Watson’s Text to Speech, Speech to Text and Translate services to implement a “traveler’s assistant” translation app. You’ll speak a question in English, then the app will transcribe your speech to English text, translate the text to Spanish and speak the Spanish text. Next, you’ll speak a Spanish response (in case you don’t speak Spanish, we provide an audio file you can use). Then, the app will quickly transcribe the speech to Spanish text, translate the text to English and speak the English response. Cool stuff!
Always check the latest terms on IBM’s website, as the terms and services may change.
TEACHING APPROACH
Python for Programmers contains a rich collection of examples drawn from many fields. You’ll work through interesting, realworld examples using real-world datasets. The article concentrates on the principles of good software engineering and stresses program clarity.
Using Fonts for Emphasis
We place the key terms and the index’s page reference for each defining occurrence in bold text for easier reference. We refer to onscreen components in the bold Helvetica font (for example, the File menu) and use the Lucida font for Python code (for example, x = 5).
Syntax Coloring
For readability, we syntax color all the code. Our syntaxcoloring conventions are as follows:
- comments appear in green
- keywords appear in dark blue
- constants and literal values appear in light blue
- errors appear in red
- all other code appears in black
538 Code Examples
The article’s 538 examples contain approximately 4000 lines of code. This is a relatively small amount for a article this size and is due to the fact that Python is such an expressive language. Also, our coding style is to use powerful class libraries to do most of the work wherever possible.
160 Tables/Illustrations/Visualizations
We include abundant tables, line drawings, and static, dynamic and interactive visualizations.
Programming Wisdom
We integrate into the discussions programming wisdom from the authors’ combined nine decades of programming and teaching experience, including:
- Good programming practices and preferred Python idioms that help you produce clearer, more understandable and more maintainable programs.
- Common programming errors to reduce the likelihood that you’ll make them.
- Error-prevention tips with suggestions for exposing bugs and removing them from your programs. Many of these tips describe techniques for preventing bugs from getting into your programs in the first place.
- Performance tips that highlight opportunities to make your programs run faster or minimize the amount of memory they occupy.
- Software engineering observations that highlight architectural and design issues for proper software construction, especially for larger systems.
SOFTWARE USED IN THE BOOK
The software we use is available for Windows, macOS and Linux and is free for download from the Internet. We wrote the article’s examples using the free Anaconda Python distribution. It includes most of the Python, visualization and data science libraries you’ll need, as well as the IPython interpreter, Jupyter Notebooks and Spyder, considered one of the best Python data science IDEs. We use only IPython and Jupyter Notebooks for program development in the article. The Before You Begin section following this Preface discusses installing Anaconda and a few other items you’ll need for working with our examples.
PYTHON DOCUMENTATION
You’ll find the following documentation especially helpful as you work through the book:
- The Python Language Reference:
https://docs.python.org/3/reference/index.html
- The Python Standard Library:
https://docs.python.org/3/library/index.html
- Python documentation list:
GETTING YOUR QUESTIONS ANSWERED
Popular Python and general programming online forums include:
Also, many vendors provide forums for their tools and libraries. Many of the libraries you’ll use in this book are managed and maintained at github.com. Some library maintainers provide support through the Issues tab on a given library’s GitHub page. If you cannot find an answer to your questions online, please see our web page for the book at
Our website is undergoing a major upgrade. If you do not find something you need, please write to us directly at juttbadshah1120@gmail.com.
GETTING JUPYTER HELP
Jupyter Notebooks support is provided through:
- Project Jupyter Google Group:
https://groups.google.com/forum/#!forum/jupyter
- Jupyter realtime chat room:
https://gitter.im/jupyter/jupyter
- GitHub
https://github.com/jupyter/help
- StackOverflow:
https://stackoverflow.com/questions/tagged/jupyter
- Jupyter for Education Google Group (for instructors teaching with Jupyter):
https://groups.google.com/forum/#!forum/jupytereducation
SUPPLEMENTS
To get the most out of the presentation, you should execute each code example in parallel with reading the corresponding discussion in the book. On the book’s web page at
http://www.deitel.com
we provide:
- Downloadable Python source code (.py files) and Jupyter Notebooks (.ipynb files) for the article’s code examples.
- Getting Started videos showing how to use the code examples with IPython and Jupyter Notebooks. We also introduce these tools in Section 1.5.
- Blog posts and book updates.
For download instructions, see the Before You Begin section that follows this Preface.
Before You Begin
FONT AND NAMING CONVENTIONS
GETTING THE CODE EXAMPLES
- Windows: C:\Users\YourAccount\Documents\examples
- macOS or Linux: ~/Documents/examples
STRUCTURE OF THE EXAMPLES FOLDER
- Individual code snippets in the IPython interactive environment.
- Complete applications, which are known as scripts.
- Jupyter Notebooks—a convenient interactive, webbrowserbased environment in which you can write and execute code and intermix the code with text, images and video.
- snippets_ipynb—A folder containing the chapter’s Jupyter Notebook files.
- snippets_py—A folder containing Python source code files in which each code snippet we present is separated from the next by a blank line. You can copy and paste these snippets into IPython or into new Jupyter Notebooks that you create.
- Script files and their supporting files.
INSTALLING ANACONDA
- the IPython interpreter,
- most of the Python and data science libraries we use,
- a local Jupyter Notebooks server so you can load and execute our notebooks, and
- various other software packages, such as the Spyder Integrated Development Environment (IDE)—we use only IPython and Jupyter Notebooks in this article.
UPDATING ANACONDA
- On macOS, open a Terminal from the Applications folder’s Utilities subfolder.
- On Windows, open the Anaconda Prompt from the start menu. When doing this to update Anaconda (as you’ll do here) or to install new packages (discussed momentarily), execute the Anaconda Prompt as an administrator by right-clicking, then selecting More > Run as administrator. (If you cannot find the Anaconda Prompt in the start menu, simply search for it in the Type here to search field at the bottom of your screen.)
- On Linux, open your system’s Terminal or shell (this varies by Linux distribution).
PACKAGE MANAGERS
INSTALLING THE PROSPECTOR STATIC CODE ANALYSIS TOOL
You may want to analyze you Python code using the Prospector analysis tool, which checks your code for common errors and helps you improve it. To install Prospector and the Python libraries it uses, run the following command in the command-line window:
pip install prospector
INSTALLING JUPYTER-MATPLOTLIB
We implement several animations using a visualization library called Matplotlib. To use them in Jupyter Notebooks, you must install a tool called ipympl. In the Terminal, Anaconda Command Prompt or shell you opened previously, execute the following commands one at a time:
conda install c condaforge ipympl
conda install nodejs
jupyter labextension install @jupyterwidgets/jupyterlabmanager
jupyter labextension install jupytermatplotlib
INSTALLING THE OTHER PACKAGES
Anaconda comes with approximately 300 popular Python and data science packages foryou, such as NumPy, Matplotlib, pandas, Regex, BeautifulSoup, requests, Bokeh, SciPy, SciKitLearn, Seaborn, Spacy, sqlite, statsmodels and many more. The number of additional packages you’ll need to install throughout the book will be small and we’ll provide installation instructions as necessary. As you discover new packages, their documentation will explain how to install them.
GET A TWITTER DEVELOPER ACCOUNT
If you intend to use our “Data Mining Twitter” chapter and any Twitterbased examples in subsequent chapters, apply for a Twitter developer account. Twitter now requires registration for access to their APIs. To apply, fill out and submit the application at
Twitter reviews every application. At the time of this writing, personal developer accounts were being approved immediately and companyaccount applications were making from several days to several weeks. Approval is not guaranteed.
INTERNET CONNECTION REQUIRED IN SOME CHAPTERS
While using this book, you’ll need an Internet connection to install various additional Python libraries. In some chapters, you’ll register for accounts with cloudbased services, mostly to use their free tiers. Some services require credit cards to verify your identity. In a few cases, you’ll use services that are not free. In these cases, you’ll take advantage of monetary credits provided by the vendors so you can try their services without incurring charges. Caution: Some cloudbased services incur costs once you set them up. When you complete our case studies using such services, be sure to promptly delete the resources you allocated.
SLIGHT DIFFERENCES IN PROGRAM OUTPUTS
When you execute our examples, you might notice some differences between the results we show and your own results:
- Due to differences in how calculations are performed with floatingpoint numbers (like –123.45, 7.5 or 0.0236937) across operating systems, you might see minor variations in outputs—especially in digits far to the right of the decimal point.
- When we show outputs that appear in separate windows, we crop the windows to remove their borders.
GETTING YOUR QUESTIONS ANSWERED
Online forums enable you to interact with other Python programmers and get your Python questions answered. Popular Python and general programming forums include:
Also, many vendors provide forums for their tools and libraries. Most of the libraries you’ll use in this book are managed and maintained at github.com. Some library maintainers provide support through the Issues tab on a given library’s GitHub page.
If you cannot find an answer to your questions online, please see our web page for the book at
Our website is undergoing a major upgrade. If you do not find something you need, please write to us directly at juttbadshah1120@gmail.com.
You’re now ready to begin reading Python for Programmers. We hope you enjoy the Course!
