IT’ S THE LIBRARIES!
Throughout the article, we focus on using existing libraries to help you avoid “reinventing the wheel,” thus leveraging your programdevelopment efforts. In this article, you’ll use a broad range of Python standard libraries, datascience libraries and thirdparty libraries.
Python Standard Library
The Python Standard Library provides rich capabilities for text/binary data processing, mathematics, functionalstyle programming, file/directory access, data persistence, data compression/archiving, cryptography, operatingsystem services, concurrent programming, interprocess communication, networking protocols.
Some of the Python Standard Library modules we use in the article
- collections—Additional data structures beyond lists, tuples, dictionaries and sets.
- csv—Processing commaseparated value files.
- datetime, time—Date and time manipulations.
- decimal—Fixedpoint and floating-point arithmetic, including monetary calculations.
- math—Common math constants and operations.
- os—Interacting with the operating system.
- queue—Firstin, firstout data structure.
- random—Pseudorandom numbers.
- re—Regular expressions for pattern matching.
- sqlite3—SQLite relational database access.
- string—String processing.
- sys—Commandline argument processing; standard input, standard output and standard error streams.
- timeit—Performance analysis.
Data-Science Libraries
Python has an enormous and rapidly growing community of opensource developers in many fields. One of the biggest reasons for Python’s popularity is the extraordinary range of open-source libraries developed by its opensource community. Handson data science, key datascience libraries and more. The following table lists various popular data-science libraries. For a nice summary of Python visualization libraries see:
Popular Python libraries used in data science
Scientific Computing and Statistics
- NumPy (Numerical Python)—Python does not have a built-in array data structure. It uses lists, which are convenient but relatively slow. NumPy provides the high-performance nd-array data structure to represent lists and matrices, and it also provides routines for processing such data structures.
- SciPy (Scientific Python)—Built on NumPy, SciPy adds routines for scientific processing, such as integrals, differential equations, additional matrix processing and more. scipy.org controls SciPy and NumPy.
- Pandas—An extremely popular library for data manipulations. Pandas makes abundant use of NumPy’s ndarray.
- Matplotlib—A highly customizable visualization and plotting library. Supported plots include regular, scatter, bar, contour, pie, quiver, grid, polar axis, 3D and text.
- Seaborn—A higherlevel visualization library built on Matplotlib. Seaborn adds a nicer lookandfeel, additional visualizations and enables you to create visualizations with less code.
- scikit-learn—Top machine-learning library. Machine learning is a subset of AI. Deep learning is a subset of machine learning that focuses on neural networks.
- Keras—One of the easiest to use deeplearning libraries. Keras runs on top of Keras—One of the easiest to use deep learning libraries.
- Keras runs on top of TensorFlow (Google), CNTK (Microsoft’s cognitive toolkit for deep learning) or Theano (UniversitĂ© de MontrĂ©al).
- TensorFlow—From Google, this is the most widely used deep learning library.Tensor-Flow works with GPUs (graphics processing units) or Google’s custom TPUs (Tensor processing units) for performance. You’ll use the version of Keras that’s built into TensorFlow.
- OpenAI Gym—A library and environment for developing, testing and comparing reinforcement-learning algorithms.
- NLTK (Natural Language Toolkit)—Used for natural language processing (NLP) tasks.
- TextBlob—An objectoriented NLP textprocessing library built on the NLTK and pattern NLP libraries. TextBlob simplifies many NLP tasks.
- Gensim—Similar to NLTK.