How to Add or Remove Columns from Sqlite Tables

Sqlite is a lightweight and powerful database engine. It is used mobile platforms such as Android and iOS as the default database engine. In many of my Android applications, I use pre-populated sqlite databases for data that changes rarely. As I released new versions of the app, sometimes I had to add, remove or rename columns of sqlite tables with data. Interestingly depending on the version of sqlite used in your machine, there are multiple ways to alter columns of sqlite tables.

If you are using sqlite 3.35 or later you can use the following commands,

  • Add a column - ALTER TABLE [table name] ADD COLUMN [column name]
  • Rename a column - ALTER TABLE [table name] RENAME COLUMN [column name]
  • Remove a column - ALTER TABLE [table name] DROP COLUMN [column name]

However if you are using sqlite versions between 3.25 and 3.35, you have access only to the following commands,

  • Add a column - ALTER TABLE [table name] ADD COLUMN [column name]
  • Rename a column - ALTER TABLE [table name] RENAME COLUMN [column name]

If you are using sqlite version 3.2, you have access only to the following command,

  • Add a column - ALTER TABLE [table name] ADD COLUMN [column name]

None of the above commands are available if you are using sqlite versions prior to 3.2. However you can use a multi-step process given below to migrate tables without loosing data,

  • Rename the current table
  • Create a new table with previous name
  • Copy data from renamed table to the new table
  • Drop the renamed table

Let me illustrate the various options by using a sample scenario. Assume we have a customer table which contains columns id, fullname and email.

CREATE TABLE customer(
 id VARCHAR(10) PRIMARY KEY,
 fullname VARCHAR(128),
 email VARCHAR(128));

Assume we want to drop the email column and then add an address column. In Sqlite 3.35 and above you can use the following commands,

ALTER TABLE customer ADD COLUMN address VARCHAR(128);
ALTER TABLE customer DROP COLUMN address;

In older versions, you can use the following generic approach.

1. Rename the current customer table,

ALTER TABLE customer RENAME TO old_customer;

2. Create a new customer table,

CREATE TABLE customer(
 id VARCHAR(10) PRIMARY KEY,
 fullname VARCHAR(128),
 address VARCHAR(128));

3. Copy data from old table (note that address field will be blank for all rows),

INSERT INTO customer SELECT id, fullname,"" FROM old_customer ;

4. Finally delete the old table,

DROP TABLE old_customer;

Alternatively you can use a tool such as Sqlite Browser to change database structure. In Sqlite browser, click on database structure and then select the table. Click modify table button on top. From the next window you can add, remove or modify order of columns.

How to Capture TCP/IP Network Traffic From Kubernetes Pod

If you have applications running as containers in Kubernetes clusters, you may come across scenarios where you want to monitor or analyse raw network traffic. This is usually needed if your containers are making outbound API calls and you find random connection issues with your API calls. If you are using public cloud providers such as Azure, there are monitoring tools such as Network watcher which can be used to monitor traffic. However in this article I will show you a quick and easy way to monitor the pod network traffic at the TCP/IP level using pod console only.

We will be using a linux command line utility called tcpdump to capture TCP/IP network traffic. Note that API calls using http protocol actually runs on top of TCP/IP. Hence tcpdump provides a more precise view of networking issues at the TCP/IP level.

Step 1: Identify the pod name using the following command in you machine's command line. We will use the pod name to connect to the running pod. The following command assumes you already have kubectl client installed in your machine for accessing kubernetes clusters,

kubectl get pods

Step 2: Use the pod name returned by the above command to connect and get a direct command line access to the running pod's operating system. You need to replace POD_NAME with the name of the pod returned above,

kubectl exec POD_NAME -it -- sh

Step 3: Once you are connected to the pod, run tcpdump --version in the pod to check whether you have the tcpdump command available in your running pod. If not, run one of the following commands to install tcpdump to your running container. Please note that if your container is restarted any time, you will have to run the command again to get tcpdump installed.

If Debian based linux distributions is used in the pod, run follwing command to install tcpdump,

apt-get update && apt-get install tcpdump

If Alpine linux or busybox distributions is used in the pod, run following command to install tcpdump,

apk add tcpdump

Step 4: Now you are all set to capture network packets from the pod. Run the following command on the pod command line to capture network traffic to a file named networkcapture.cap. The -s option with 0 ensures capture of large network packets, -vvv option ensures detailed capture and -w specifies the file name for capture.

tcpdump -s 0 -vvv -w networkcapture.cap

The above command captures detailed logs for all TCP/IP transactions. If you want to limit the traffic to a destination IP or host name (when you are troubleshooting API call issues), you can run the following command,

tcpdump -s 0 -vvv -w networkcapture.cap host IP_OR_HOST

Step 5: When you want to stop the capture, you can press Ctrl-C to break out of tcpdump. Now you have networkcapture.cap available in your pod. Using pwd command find the location of the file and note it down. Use this value to replace REMOTE_POD_PATH_INCLUDING_FILE_NAME in the following command.

pwd

Step 6: Exit from the pod console using exit command. Run the following command from your machine to download the networkcapture.cap file from pod to your local folder.

kubectl cp POD_NAME:REMOTE_POD_PATH_INCLUDING_FILE_NAME LOCAL_PATH_INCLUDING_FILE_NAME

Here is how a sample command looks like,

kubectl cp myapp-5bbc4d64c5-6rsdx:/usr/docker/app/networkcapture.cap /Users/jj/networkcapture.cap

Step 7: You can perform a detailed analysis of the tcpdump capture using the wireshark tool. Download it from here.

MongoDB Date Comparison Query Examples

In this guide, I will show you various examples of date comparison based queries in MongoDB. If you use MongoDB in your projects, I highly recommend bookmarking this page as a quick reference for date comparison queries.

For the following MongoDB query examples, I will use a collection representing products created in an e-commerce system. Since the examples are focused on date comparison, our product collection will be a simple one consisting only of these fields - name, price, createddate, updateddate. By running the following command on MongoDB shell, we will insert few records into the product collection. Note that the date fields also contain time.

db.products.insert({ name: "PS5", 
    price: 400, createddate: ISODate("2022-01-10T08:30:00.000"), 
    updateddate: ISODate("2022-01-15T08:30:00.000")} )

db.products.insert( { name: "XBox X", 
    price: 320, createddate: ISODate("2022-02-15T18:30:00.000"), 
    updateddate: ISODate("2022-03-15T20:30:00.000")} )

db.products.insert( { name: "Nintendo Switch", 
    price: 220, createddate: ISODate("2022-02-10T14:30:00.000"), 
    updateddate: ISODate("2022-03-11T22:30:00.000")} )

MongoDB supports find() command for queries. Alternatively you can also use the powerful MongoDB aggregation pipeline for queries. For each query requirement, I will provide both find() query and aggregation pipeline query. However I recommend learning aggregation queries as it is much more powerful than find(). Also note that the following queries are tested on MongoDB version 4.4.

How to Find Documents Between Two Dates in MongoDB

The following query returns products created between 1st January 2022 and 10th February 2022. Note that in this case we are not interested in the time for the dates compared. Following is the find() query to find documents within 1st January 2022 and 10th February 2022,

db.getCollection('products').find(
    {createddate:{$gt:ISODate("2022-01-01"),
    $lt:ISODate("2022-02-10")}})

Following is the aggregate() pipeline query,

db.getCollection('products').aggregate(
    [{"$match" : {"createddate" : {"$gt":ISODate("2022-01-01"), 
    "$lt":ISODate("2022-02-10") } }}]);

The above queries return 1 record from the sample data we have added. Please note that when time is not specified, it is assumed to be 12AM UTC time and hence the record with date 2022-02-10T14:30:00.000 is not returned in the query.

The following find() and aggregation queries demonstrate the use of time in ISODate() and also how to filter queries with additional conditions. We want only PS5 products to be returned within the date range.

db.getCollection('products').find(
    {name:"PS5",
    createddate:{$gt:ISODate("2022-01-01T10:10:40"),
    $lt:ISODate("2022-02-10T22:20:01")}});
db.getCollection('products').aggregate([
    {"$match" : {"name":"PS5"}},
    {"$match" : {"createddate" : {"$gt":ISODate("2022-01-01"), 
    "$lt":ISODate("2022-02-10") } }}
]);

How to Find Documents Using Month or Year Comparison in MongoDB

Sometimes you may want to find all the documents where one of the date fields matches a specific month or year. Following aggregate query uses date expression operator to find all the products updated in the month of March. This query first adds a virtual field called month extracting the month field from the updateddate column. It is then used in the match pipeline.

db.getCollection('products').aggregate([
    {$addFields: {  "month" : {$month: '$updateddate'}}},
    {$match: { month: 3}}]);

Following find() query returns all products updated in the month of March,

db.getCollection('products').find({
  $expr: {
        "$eq": [{"$month": "$updateddate"},3]
  }
})

You can filter on the year field of updated date by replacing $month with $year.

How to Run Selenium Python in MacOS Using Chromedriver

Selenium is a popular browser automation tool. It uses a browser client to simulate traffic to a website. It is commonly used for automating browser based web application testing and scrapping content from websites. Selenium is an open source library and it is available for use with multiple programming languages such as JavaScript, Python, Ruby, Java, Kotlin and C#. In this guide, I will show you how you can run Selenium python bindings in MacOS using Chrome browser and chromedriver.

MacOS doesn't have a native package manager. However, an open source package management tool called Homebrew is available for MacOS. We will use it to install various packages needed to get selenium up and running.

Step 1: Install Homebrew package for MacOS. See https://brew.sh/ for latest instructions on installing brew. As of writing this article, the following command will install brew,

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: If you don't have Chrome browser already installed, visit Chrome hompage for download and installation.

Step 3: Install python3 on your machine if it is not already available by running the following command,

brew install python

Step 4: Install selenium from MacOS command line. This assumes you already have python3 installed in your Mac machine.

sudo pip3 install selenium

Step 5: Install chromedriver using the following command. Chromedriver is a separate standalone executable that Selenium Webdriver uses to connect to and control browser actions.

brew install chromedriver --cask

By default chromedriver installed in the folder /usr/local/Caskroom/chromedriver/104.0.5112.79/chromedriver in MacOS. Please note that 104.0.5112.79 may be different as it indicates the version of the chromedriver.

Step 6: Remove quarantine for chromedriver. This is essential since by default MacOS won't allow running unsigned executables. This requires you to find the folder where chromedriver is installed and then run the following command within that folder (/usr/local/Caskroom/chromedriver/REPLACE WITH VERSION/chromedriver),

xattr -d com.apple.quarantine chromedriver

Step 7: Verify everything is setup properly. Run python3 command and then type import selenium in python3 console. If you don't see any error you are all set for writing selenium automation scripts in python!

Step 8: Write a sample python selenium script to get Google search results. Save the following program in a file named googledemo.py,

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

# connect to google.com
driver = webdriver.Chrome(options=Options())
driver.get("https://www.google.com")

# find the search input field
search_field = driver.find_element(By.TAG_NAME, "input")

# type the search string
search_field.send_keys("selenium")

# send enter key to get the search results!
search_field.send_keys(Keys.ENTER)

The above selenium python script performs a Google search for the word "selenium".

Step 9: Run the sample Google search selenium script using the following command,

python3 googledemo.py

Selenium is also useful in scenarios where simple requests package based python web scrapping is blocked due to silent JavaScript challenges such as Google reCaptcha. However some of the advanced bot protection tools are capable of detecting Selenium based automation and preventing site access.

Common Issues in Running Selenium Python on MacOS

Chrome browser and chromedriver version incompatibility.
It is important that the chromedriver version and the Chrome browser version are compatible. If they are not compatible, you will get the following message when you run a Selenium script,

selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version

If you get this error, you can upgrade both Chrome browser and the corresponding chromedriver. Use the following command to upgrad chromedriver to the latest version.

brew upgrade chromedriver

Specific version of chromedriver is available here. This is useful if you want to download chromedriver version corresponding to the Google Chrome installed (if you don't want to upgrade the browser).

MacOS refusing to run chromedriver.
If you get the error "chromedriver" can't be opened because Apple cannot check it for malicious software, ensure that you have run the quarantine removal command xattr -d com.apple.quarantine chromedriver.

Recent MacOS versions such as Ventura, Monterey, Big Sur etc. includes a technology called gatekeeper which ensures only trusted software is run on the system. So if you run any software explicitly signed by Apple, you will get the above error message.

5 High Quality Open Source Projects to Learn Python

Python is a simple and easy to use object oriented programming language. Python is now very popular thanks to its concise syntax and easy to learn concepts.

Once you learn the basics and is capable of writing python programs, the next question is how to learn the pythonic way of writing large projects. This is where high quality python open source projects are of huge help. By studying these large programs you can get good understanding of large program structure, pythonic code conventions and how to organise documentation for a project.

In this article, I provide a brief overview some of the high quality large python projects available on Github. I have chosen these across different technical domains so that you can pick a subset based on your interest area for python. I briefly explain the project and the benefits of learning the structure of the project. Wish you all the best in your journey to become an advanced python developer!

Flask Python Web Framework

Flask is a lightweight micro web framework written in Python used for building Web application. Originally started as a joke, the project is one of the leading Python projects on Github. The framework makes it easy to start developing web applications with enough flexibility to build complex web applications. Flask doesn't enforce any dependencies or project layout. However there are extensions available which makes adding new features easy.

Flask is a wrapper around the comprehensive WSGI web application library Werkzeug and Jinja template engine. So to understand Flask framework, you need to explore these projects as well.

If you are an intermediate Python developer, I highly recommend looking at the Flask project. It shows how sophisticated features can be provided with a simple interface to the library user. Flask framework provides Web support for routing, request handling, configuration, error handling, sessions, templates, logging and security. It also comes with a comprehensive test suite.

Redis Python Client(redis-py)

Redis, which stands for Remote Dictionary Server, is a fast, open source, in-memory, key-value data store. The project started when Salvatore Sanfilippo wanted to improve the scalability of applications he worked on. He developed Redis, which is now used as a database, cache, message broker, and queue. Redis delivers very fast response times, enabling millions of requests per second for real-time applications.

Redis Python Client(redis-py) is the implementation of the Python interface to the Redis key-value store. I recommend going through the source code of this project since it shows you how to develop a python client project for an application with known interface specifications. It also has a redis command parser showing you the inner workings of a real command parser implementation. Project also demonstrates the use of asyncio library.

Python Requests HTTP Library(requests)

Requests is a simple, yet elegant, HTTP library. Python natively contains a package called urllib for making HTTP requests. However using urllib for HTTP in a large project will require you to write a bit of wrapper code. The requests library provides a higher abstraction than urllib there by simplifying the HTTP requirements in a project. Following are some of the additional features offered by the requests library,

  • Support for restful API
  • Built-in JSON decoder
  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication

I highly recommend studying the requests source code in detail. It shows you how to offer a better library for a feature that already exists. The idea is to provide a simplified interface to the developer by providing defaults and providing additional features on top of an existing library. Since this library is relatively simple, you can focus on learning the basics of python in practical usage without getting to much distracted by advanced concepts.

Pandas Python Library

Pandas is a flexible, powerful, fast and easy to use data analysis and manipulation tool built on python. It is a very popular python library and has been in development since 2008. Pandas offers a rich and simple API for data manipulation and analysis. It also supports multiple data sources such as CSV, JSON, Excel etc. Pandas internally uses the mathematical python library NumPy.

Pandas is a sophisticated library and demonstrates a large set of programming techniques in python. Pandas internally uses matplotlib library for visualization of data. If you plan to use any kind of plotting in your app with matplotlib, pandas provide a good demonstration of its usage. Pandas also show you how you can build a standard interface around multiple input formats such as excel, json and csv.

Keras Deep Learning Library for Python

Keras is a python based open source framework for neural network based deep learning. It acts as an interface for Tensorflow, which is a popular machine learning library developed by Google. The advantage of Keras is that it provides a higher abstraction than Tensorflow and hence it is easy to learn and use. Keras is an API designed for human beings.

With over 50,000 stars, Keras is a very popular python library on Github. It is also a highly active project. Keras project uses some of the popular open source frameworks such as tensorflow, pandas, scipy, pillow and numpy.

I recommend studying Keras python codebase to understand how to design a complex library with simple and intuitive API that developers can quickly start with. Keras project shows you how you can design and develop higher level abstractions from an existing software library. Keras cannot operate on its own, it is just a wrapper around Tensorflow simplifying its usage.