Today just wanted to highlight a simple Azure SQL and Azure Queue example that uses Azure Functions to schedule and move data to SQL Azure from a REST API in parallel. It shows a heavy use of bindings to keep the code very tight while being highly functional. My favorite part is the SchedulePokemonQueue
function, in 11 lines of code it gets all of the data from the SQL DB and drops them onto the queue to be processed with essentially just some scaffolding and a for loop.
User Defined Functions (UDFs) allow you to easily build logic to process columns in Spark but often can be inefficient, especially when written in Python. Scala UDFs are significantly faster than Python UDFs. As in orders of magnitude faster. Recently worked with someone that needed a UDF to process a few hundred GB of data. When switching from a Python UDF to prebuilt Scala UDF processing time went from 8 hours and giving up to around 15 minutes. Finding how to do this though was a challenge, so I want to document the process for others.
Had a recent issue come up where a customer was trying to use the Python Library twobitreader in a UDF to pull out some genetic information for individual genes. Think of it like being able to look up a range of characters from a file and output them as a string. The problem they were running into...
Today just a quick project I’ve been working on to do connectivity and performance monitoring of a SQL Azure database. The original goal was to record client errors for a SQL DB to help with some troubleshooting and to prove if there were actual connectivity errors or just false alarms. That lead to an overall performance monitoring tool to do basic data capture to help performance optimization of a database.
Recently saw this blog article from the CTO of Basecamp that is starting to make the rounds in the cloud computing world and thought it would be worth bringing up. It has some points that are often discussed around cloud computing but have a basic misunderstanding of how it works.
Managed Identities in Azure make handling basic authentication and authorization tasks between devices in your subscription significantly easier. Recently I had a customer wanting to authenticate from their Azure Container Instance DBT container to their Databricks instance. Their original plan was to use key vault and service principals to authenticate but I presented them with a solution around Managed Identities that made it significantly easier.