How to Change Content Type of Azure Storage Blobs Using Python SDK

Microsoft Azure cloud platform offers a comprehensive portfolio of storage services. Each of these services is suitable for specific use cases. Here is a quick overview of Azure storage services,

  • Azure Storage Account - Azure storage account can be used for a wide variety of workloads. Using this account, it is possible to create blob storage
    or file storage. Blob storage is suitable for storing a large sets of unstructured data(images for example). File storage makes content available over the SMB protocol. It is also possible to create storage queues or tables using this storage account. For analytical workloads, storage accounts can be configured as Azure data lake storage gen2 with hierarchical namespace. Finally, there is also the option of enabling archival mode for the storage offering a lower cost option.
  • Azure Managed Disk - This offers high performance block storage. There is plenty of choice for managed disk such as ultra disk storage, premium SSD, standard SSD and standard HDD.
  • Azure HPC Cache - Powered by Azure storage account, this service offers caching of files on cloud for high throughput needs.

Out of these services, Azure storage account is the most useful and powerful one. The blob storage in particular is suited for a wide range of use cases where large sets of unstructured binary data such as images is involved. It can also integrate with CDNs for faster content delivery.

There are 4 different ways of accessing Azure blob storage. These are,

  • Azure portal - Use the web based azure portal
  • Azure CLI - Command line interface
  • Azure Powershell - Powershell based command line interface
  • Azure SDKs (Python, .NET etc.) - SDKs based on various languages

In this article, we are looking at accessing blob storage using Python. When it comes to Python SDK for Azure storage services, there are two options,

Since Azure Python SDK v2.1 is deprecated, we will using Microsoft Azure Python SDK v12 for the following examples. Python 3.6 or above is required for the following examples.

How to Change Content Type of Azure Storage Blobs Using Python SDK

Before using Python SDK for Azure storage, ensure that you have the following pre-requisites installed.
Step 1: Install Python 3.6 or above. In Mac, use Homebrew to install python 3,

brew install python3

Step 2: Install the Azure Blob storage client library for Python package,

pip3 install azure-storage-blob --user

Run the following program to convert the content type of all files with extension .jpg to image/jpeg in Azure blob storage using Python SDK. You can also customize the same program for changing content type, content encoding, content md5 or cache control for the blobs. See here for the full set of content_settings attributes.

The program assumes existence of an Azure storage account with a container containing image files with wrong content type as application/octet-stream.

# Python program to change content type of .jpg files to image/jpeg in Azure blob storage
# This is useful if you accidentally uploaded jpg files with application/octet-stream content type
# Requires python 3.6 or above
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

# IMPORTANT: Replace connection string with your storage account connection string
# This usually starts with DefaultEndpointsProtocol=https;..
MY_CONNECTION_STRING = "REPLACE_THIS"
MY_IMAGE_CONTAINER = "myimages"

class AzureBlobImageProcessor:
  def __init__(self):
    print("Intializing AzureBlobImageProcessor")

    # Ensure that the container mentioned exists in storage account (myimages in this example)
    self.blob_service_client =  BlobServiceClient.from_connection_string(MY_CONNECTION_STRING).get_container_client(MY_IMAGE_CONTAINER)

  def change_content_type_for_jpg_files(self):
    print("changing content type for all files with .jpg extension")

    # You can optionally pass a prefix for blob names to list_blobs()
    # This is useful if you want to process a large number of files
    # You can run multiple instances of this program with different prefixes!
    blob_list = self.blob_service_client.list_blobs()
    file_count = 0
    for blob in blob_list:
      if ".jpg" in blob.name:

        # Print file name and current content type to monitor progress
        print(f"For file {blob.name}  current type is {blob.content_settings.content_type}")

        # Note that in addition to content_type, you can also set the following values,
        # content_encoding, content_language, content_disposition, cache_control and content_md5
        blob.content_settings.content_type = "image/jpeg"
        self.blob_service_client.get_blob_client(blob).set_http_headers(blob.content_settings)
        file_count += 1

    print(f"changing content type completed. Processed {file_count} files")

# Initialize class and change content type for all files
azure_blob_image_processor = AzureBlobImageProcessor()
azure_blob_image_processor.change_content_type_for_jpg_files()

Save the above program in file change_blob_content_type.py and then run the following command. Don't forget to update MY_CONNECTION_STRING and MY_IMAGE_CONTAINER variables before running the program,

python3 change_blob_content_type.py

I used this program for converting around 200,000 image files which were wrongly uploaded as application/octet-stream. I ran this program from a VM located in the same region where the storage is located to speed up processing. Running it locally may take a lot of time. This program is useful for bulk conversion of azure blob file content type.

If you can partition the set of files in storage using the blob name prefix, there is a way to run the content type conversion in parallel. For each instance of the python program, you can pass a different blob name prefix to the list_blobs() method. Assume you have blobs of the following full name,

  • subset1/a.jpg
  • subset1/b.jpg
  • subset2/c.jpg
  • subset2/d.jpg

You can run two python program instances with list_blobs("subset1") and list_blobs("subset2") to speed up conversion.