5 High Quality Open Source Projects to Learn Python

Python is a simple and easy to use object oriented programming language. Python is now very popular thanks to its concise syntax and easy to learn concepts.

Once you learn the basics and is capable of writing python programs, the next question is how to learn the pythonic way of writing large projects. This is where high quality python open source projects are of huge help. By studying these large programs you can get good understanding of large program structure, pythonic code conventions and how to organise documentation for a project.

In this article, I provide a brief overview some of the high quality large python projects available on Github. I have chosen these across different technical domains so that you can pick a subset based on your interest area for python. I briefly explain the project and the benefits of learning the structure of the project. Wish you all the best in your journey to become an advanced python developer!

Flask Python Web Framework

Flask is a lightweight micro web framework written in Python used for building Web application. Originally started as a joke, the project is one of the leading Python projects on Github. The framework makes it easy to start developing web applications with enough flexibility to build complex web applications. Flask doesn't enforce any dependencies or project layout. However there are extensions available which makes adding new features easy.

Flask is a wrapper around the comprehensive WSGI web application library Werkzeug and Jinja template engine. So to understand Flask framework, you need to explore these projects as well.

If you are an intermediate Python developer, I highly recommend looking at the Flask project. It shows how sophisticated features can be provided with a simple interface to the library user. Flask framework provides Web support for routing, request handling, configuration, error handling, sessions, templates, logging and security. It also comes with a comprehensive test suite.

Redis Python Client(redis-py)

Redis, which stands for Remote Dictionary Server, is a fast, open source, in-memory, key-value data store. The project started when Salvatore Sanfilippo wanted to improve the scalability of applications he worked on. He developed Redis, which is now used as a database, cache, message broker, and queue. Redis delivers very fast response times, enabling millions of requests per second for real-time applications.

Redis Python Client(redis-py) is the implementation of the Python interface to the Redis key-value store. I recommend going through the source code of this project since it shows you how to develop a python client project for an application with known interface specifications. It also has a redis command parser showing you the inner workings of a real command parser implementation. Project also demonstrates the use of asyncio library.

Python Requests HTTP Library(requests)

Requests is a simple, yet elegant, HTTP library. Python natively contains a package called urllib for making HTTP requests. However using urllib for HTTP in a large project will require you to write a bit of wrapper code. The requests library provides a higher abstraction than urllib there by simplifying the HTTP requirements in a project. Following are some of the additional features offered by the requests library,

  • Support for restful API
  • Built-in JSON decoder
  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication

I highly recommend studying the requests source code in detail. It shows you how to offer a better library for a feature that already exists. The idea is to provide a simplified interface to the developer by providing defaults and providing additional features on top of an existing library. Since this library is relatively simple, you can focus on learning the basics of python in practical usage without getting to much distracted by advanced concepts.

Pandas Python Library

Pandas is a flexible, powerful, fast and easy to use data analysis and manipulation tool built on python. It is a very popular python library and has been in development since 2008. Pandas offers a rich and simple API for data manipulation and analysis. It also supports multiple data sources such as CSV, JSON, Excel etc. Pandas internally uses the mathematical python library NumPy.

Pandas is a sophisticated library and demonstrates a large set of programming techniques in python. Pandas internally uses matplotlib library for visualization of data. If you plan to use any kind of plotting in your app with matplotlib, pandas provide a good demonstration of its usage. Pandas also show you how you can build a standard interface around multiple input formats such as excel, json and csv.

Keras Deep Learning Library for Python

Keras is a python based open source framework for neural network based deep learning. It acts as an interface for Tensorflow, which is a popular machine learning library developed by Google. The advantage of Keras is that it provides a higher abstraction than Tensorflow and hence it is easy to learn and use. Keras is an API designed for human beings.

With over 50,000 stars, Keras is a very popular python library on Github. It is also a highly active project. Keras project uses some of the popular open source frameworks such as tensorflow, pandas, scipy, pillow and numpy.

I recommend studying Keras python codebase to understand how to design a complex library with simple and intuitive API that developers can quickly start with. Keras project shows you how you can design and develop higher level abstractions from an existing software library. Keras cannot operate on its own, it is just a wrapper around Tensorflow simplifying its usage.