How to Store a File as an Azure Vault Secret

Azure vault offers a highly secure way of storing secrets used in your applications. Azure vault natively supports keys used in encryption, SSL certificates or simple text secrets such as passwords. It is easier to use azure vault secrets in applications deployed under Azure app service. You can simply specify it as a key vault reference in the configuration section of the app service and the content of the secret will be available as an environment variable in your application.

Assume that your application reads the secret from the environment variable named app_secret(Please note that this could be any name). You will need to add this variable as an application setting with name as app_secret. Now assume you have your secret saved under the name "replace-with-secret-name" in the Azure vault named "replace-with-vault-name". In order to populate the value of the secret in the application setting named app_secret, give it the following value,

@Microsoft.KeyVault(VaultName=replace-with-vault-name;SecretName=replace-with-secret-name)

This is known as a Key vault reference which at runtime is replaced with the value of the secret by Azure platform. This is a safe and automatic way of providing secrets from Azure vault to the running application through the environment variables.

How to Store a File as Azure Vault Secret Using Azure CLI or Powershell?

One common requirement while writing web applications is to store a private key and certificate file in PEM format as secrets in Azure key vault. This is needed when you don't want a single certificate with private key in vault, but rather 2 separate files one containing private key and the other one containing certificate. A typical use case will be programmatically connecting to an external API through SSL connection.

In such cases you cannot do that from the Azure console since it only supports creation of single line secrets. However you can use either Azure CLI or Azure Powershell to create multi-line secrets from a file.

Here are the steps for saving a file as vault secret using Azure CLI,
First save your private key in PEM format in a text file named key.pem. Then run the following Azure CLI command to save the file as a secret inside your Azure vault,

az keyvault secret set --vault-name "replace-with-vault-name" --name "replace-with-secret-name" --file "key.pem"

Then save your certificate in PEM format in a text file named cert.pem. Then run the following Azure CLI command to save the file as a secret under an Azure vault,

az keyvault secret set --vault-name "replace-with-vault-name" --name "replace-with-secret-name" --file "cert.pem"

If you are using Powershell, you need to first convert the file into a secure string,

$RawSecret = Get-Content "key.pem" -Raw
$SecureSecret = ConvertTo-SecureString -String $RawSecret -AsPlainText -Force

Then use it to store the secret in Azure vault,

$secret = Set-AzKeyVaultSecret -VaultName "replace-with-vault-name" -Name "replace-with-secret-name" -SecretValue $SecureSecret

Now repeat the same steps for cert.pem file as well using Azure Powershell.

Use the following command to verify that the multi-line secret is created properly,

az keyvault secret show --name "replace-with-secret-name" --vault-name "replace-with-vault-name" --query "value"

Note that if you try to delete and recreate the Azure vault secrets, you may get an error if you have soft delete enabled. In such cases you will have to first disable purge protection and then purge the deleted key before you can create it again.

What is the Maximum Size of a MongoDB Document?

MongoDB is a NoSQL database which can store JSON like documents in containers known as collections. There is no technical limit as to how many documents can be stored in a MongoDB collection. However, current versions of MongoDB has a limit on the maximum size of a single document stored in a collection. Since 1.7.4 version of MongoDB, single document size limit is 16MB (Originally it was 4MB). MongoDB stores JSON like documents in a binary representation known as the BSON document. The size limit of 16MB applies to the BSON form, not to the JSON representation.

For example, if you try to insert a single document with more than 16MB into MongoDB using Pymongo library for python, you will get the following error,

DocumentTooLarge: BSON document too large

There are two options to solve the size limit issue. You need to either redesign the schema or use GridFS.

If you need to store documents larger than 16MB in MongoDB, you can use the GridFS specification available in MongoDB. GridFS splits the large document in smaller chunks of size 255KB and stores them in a collection. The metadata for the document itself is stored in a separate collection. Hence each entity in GridFS needs two collections for data storage.

Please be aware that the 16MB limit is for pure MongoDB implementations. Other database platforms that provide MongoDB API compatibility may have a different maximum document size. For example, Azure CosmosDB for MongoDB API has a per document size limit of 2MB for JSON representation. This limit requires careful design decisions when collections are designed for CosmosDB database since a future requirement or rare data instances may cause failure of data storage. In such cases, refactoring of collections may be needed. For example, you may be storing all product IDs under a category in a product category collection until a super category appears with a large number of products causing the document size to hit the 2MB limit. One solution in this case is to model each product to category association as a separate document!

The limit of 16MB is needed to ensure that excessive RAM or network transfers are not caused by very large documents. If you face issues with 16MB limit, it usually means you need to redesign your MongoDB schema or use GridFS specification. Also note that when it comes to complex documents, MongoDB has a limit of 100 levels of nesting.

Here are some tips to analyse/refactor your MongoDB schema if you hit 16MB limit,

  • Is it possible to logically split the collection containing the document into multiple collections? If needed it is possible to combine data from multiple collections using the $lookup operator in MongoDB.
  • Can you model the schema such that nested elements or arrays can be modelled as individual documents? With indexing, multiple documents matching a criteria can be easily retrieved. It is always better to have lot of small documents than a few large documents.

NextJS Build Issues with Mac M1 and Docker Desktop

While developing a NextJS web application with Mac M1 and docker desktop, I faced a number of issues. This article summarises the problems and how I solved it.

The first issue I faced was during the docker container build from the NextJS folder with a Dockerfile. I used the following command to build the container image,

docker build . -t myreg123.azurecr.io/myapp

During the docker build, the following error was observed when RUN npm install task was running,

=> [4/9] RUN npm install 324.9s
=> => # #
=> => # # Fatal process OOM in Failed to reserve virtual memory for CodeRange
=> => # #
=> => # qemu: uncaught target signal 5 (Trace/breakpoint trap) - core dumped

It turned out that this is an issue with Mac M1 machines if you are using older versions of docker desktop. Upgrading docker desktop to 4.5 solved the issue!

The next issue I faced was when I tried to deploy the docker image on an Azure AKS cluster running Linux VMS. The container failed to load on AKS with the CrashLoopBackOff error. The interesting thing was that this container image ran without any issues on my local Mac M1. I ran the following kubectl command to find the root cause of this error in AKS,

kubectl --v=8 logs myapp-bc12b6bb2-chbrg

The above command showed the following error text,

standard_init_linux.go:228: exec user process caused: exec format error

Then I realised that Mac M1 is generating container images suitable for ARM architecture! So to force creation of a docker images suitable for linux, I used the following modified command to create docker images,

docker build --platform=linux/amd64 . -t myreg123.azurecr.io/myapp

This solved the issue and my NextJS app was running without any issues on AKS cluster.

How to Add robots.txt File to NextJS App

In this article, I will explain the significance of robots.txt in a NextJS project and how you can add both static and dynamic robots.txt file to your web app.

NextJS is a popular front-end JavaScript framework built on top of ReactJS. NextJS enables development of fast and responsive web apps with features such as static rendering, image optimisation, page routing, code splitting/bundling and API routes for a fullstack application. It is a highly recommended framework for building any consumer Web application intended for public access.

One of the common requirements for an internet facing Web application is the need for the robots.txt file. It is a simple text file hosted on the root of the application domain. It is intended for various web crawlers that index your site for search. The robots.txt file specifies which parts of your Web application are allowed to be crawled by the web-crawlers. Search engines first look at robots.txt before crawling and indexing your site. Not all web-crawlers support robot.txt, however popular and important ones such as Googlebot web crawler supports this file. So if you specify a set of files or folders as disallowed, Google web crawler will not crawl its content. You can also optionally specify location of the sitemap file for your website in robots.txt (see example below).

It is highly recommended to add a robots.txt file to your NextJS web application. There will be components or folders in your NextJS application that you don't want search engines such as Google to index. For example, usually API routes are intended for the private use of your pages and you don't want search engines or people to directly access them. You can disallow access to such URLs by adding them to robots.txt file.

How to Add robots.txt File to a NextJS Project

By default, any files added to the public sub folder of a NextJS project is accessible from the root of the application. Hence if you have your NextJS application running on www.example.com, you just need to add the robots.txt to the public folder of the NextJS project and it will be available at the URL, www.example.com/robots.txt. This is the location web crawlers look for the robots.txt file.

Here is a sample robots.txt I have in my NextJS project. Note that this file tells Google crawler to avoid indexing the /api folder since it contains all my API endpoints intended only for the NextJS app itself. Also note that I have specified the location of the sitemap in this file. This is optional and you can also add path to sitemap file directly in the Google's Webmaster tools console (Obvious disadvantage is that it is only applicable for Google).

User-agent: *
Disallow: /api/

Sitemap: https://www.quickprogrammingtips.com/sitemap.xml

How to Create a Dynamic robots.txt in NextJS App?

If you are working with a large project where you need the robots.txt content to be dynamic, you can also create custom rewrite for robots.txt path in NextJS and then serve dynamic content. For example, add the following to next.config.js. This tells Next to return content from /api/dynamicrobot when a web browser or crawler tries to access /robots.txt.

// next.config.js
module.exports = {
    async rewrites() {
        return [
            {
                source: '/robots.txt',
                destination: '/api/dynamicrobot'
            }
        ];
    }
}

Then create an API endpoint JS file (/pages/api/dynamicrobot.js) that dynamically creates the robots.txt content.

// Content of /pages/api/dynamicrobot.js file
export default function handler(req, res) {
    // Logic here to generate dynamic robots.txt file content
    res.send('The full robots.txt file content dynamically created is here.');
}

How to Use Azure CosmosDB Management API SDK for Python

Azure provides an excellent portal for managing various cloud resources. If you are working with CosmosDB database accounts, you will be familiar with the powerful interface available in Azure portal. Sometimes you may need to access some of the CosmosDB management features programatically. If you plan to build something on the command line, you can use either Azure CLI interface or the Azure PowerShell interface to access CosmosDB management features.

However, sometimes you may want direct programmatic access from your preferred programming language. Good news is that Azure API SDKs are available for common languages such as Python, .NET, Java and JavaScript. In the following section I will show you how to use powerful Azure management APIs using the Azure SDK for Python. For this example I will be specifically using the CosmosDB management API to list CosmosDB accounts and print connection strings (access keys) for one of the database accounts.

How to List CosmosDB Accounts in an Azure Subscription Using Azure Management API for Python

The following code snippet assumes that you already have Azure CLI setup so that the user credentials is available through the AzureCliCredential class. See this page for other methods of providing user credentials/service principal for API access. If you get the error "SubscriptionNotFound", check the subscription id used in the code and that you have logged into the subscription using Azure CLI.

# This program will list all the CosmosDB database accounts under a subscription.
# Uses Azure Management API for Python
# Also assumes Azure CLI is installed and configured with user authorization. Hence this code doesn't expose Azure user id and password.

from azure.mgmt.cosmosdb import CosmosDBManagementClient
from azure.identity import AzureCliCredential
import re

# Replace the following variable with your subscription id
subscription_id = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

# Connect to CosmosDB management API
cosmosdb_mgmt_client = CosmosDBManagementClient(AzureCliCredential(),subscription_id)

# Query the list of CosmosDB accounts under the subscription specified above
cosmosdb_accounts = cosmosdb_mgmt_client.database_accounts.list()

# For each account, let us print account name, id, access endpoint. Please note that the object has lot more attributes available.
for db_account in cosmosdb_accounts:
    print(f"Name={db_account.name},Id={db_account.id},Endpoint={db_account.document_endpoint}")

# There are additional APIs that can be used to get more specific details of CosmosDB cosmosdb_accounts

How to List Connection Strings for a CosmosDB Account Using Azure CosmosDB SDK for Python

The following code snippet assumes that you already have Azure CLI setup so that the user credentials is available through the AzureCliCredential class. If you get the error "SubscriptionNotFound", check the subscription id used in the code and that you have logged into the subscription using Azure CLI.

# This program will list all the CosmosDB database accounts under a subscription.
# Uses Azure Management API for Python
# Also assumes Azure CLI is installed and configured with user authorization. Hence this code doesn't expose Azure user id and password.

from azure.mgmt.cosmosdb import CosmosDBManagementClient
from azure.identity import AzureCliCredential
import re

# Replace the following variable with your subscription id
subscription_id = 'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX'

# Connect to CosmosDB management API
cosmosdb_mgmt_client = CosmosDBManagementClient(AzureCliCredential(),subscription_id)


# The following code snippet shows how to print connection strings with access keys of an account

# Let us get the first account from the accounts list
first_account = next(cosmosdb_mgmt_client.database_accounts.list())
account_id = first_account.id
account_name = first_account.name

# Extract resource group name for the CosmosDB account since the next API needs resource group
resource_group = re.search('resourceGroups/(.*)/providers', account_id).group(1)


# Call API to list connection strings. Note that the API returns an object
cs_list_obj = cosmosdb_mgmt_client.database_accounts.list_connection_strings(
       f"{resource_group}",
       f"{account_name}"
   )

# This lists all 4 keys of CosmosDB accounts
# This includes primary read-write, secondary read-write, primary read only and secondary read only
for connection_string_obj in cs_list_obj.connection_strings:
    print(connection_string_obj.connection_string)

The CosmosDBManagementClient used above is quite powerful and can also be used for getting detailed metrics for CosmosDB accounts, databases or collections. This is useful if you plan to use your own solution for monitoring and alerting instead of the Azure's core services for the same. For one of my solutions I use this API to fetch Max RUs Per Second, Mongo Query Request Charge and Throttled Requests for a CosmosDB account provisioned for MongoDB API.

Additional References