Hashing within the Azure Data landscape

The research

I’ve did a small research about what’s the best option is for hashing (e.g. determining deltas or hashing concatenated natural keys). Up until now, the algorithm used for hashing is MD5. Please note that we’re not discussing the hashing of passwords. Want to read more? https://www.linkedin.com/feed/update/urn:li:activity:6295982490112925696

 

Which hashing algorithm should we use?

MD5 has a minimal performance impact and the output (VARBINARY or CHAR(32)) has the lowest storage footprint. But…as of SQL Server 2016, all hashing algorithms except SHA2_256 and SHA2_512 are deprecated:

“…Beginning with SQL Server 2016, all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event…”

https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

 

Basically Microsoft is saying good bye to all other/older algorithms. When you look at the current Azure data landscape, you bet that for the near future you’re unable to continue using MD5 in your Data Warehouse or Data Mart (SQL Server 2017+, Azure SQL DB and Azure SQL DWH).

So… SHA2_256 it is for hashing keys. I guess for hashing passwords or other sensitive values, use SHA2_512 in combination with salt.

 

Wouldn’t that impact the (ETL) performance?

Yep, unfortunately. But please keep in mind that we just need to continue evolving and Microsoft is in charge 🙂 Just make sure that you only need to hash values once, e.g. in your staging/landing area.

 

How should we use it?

Within SQL Server, Azure SQL DWH or Azure SQL DB, you’ll probably use HASHBYTES() to calculate the hash and it returns a VARBINARY. Personally I’m not a big fan of storing ‘varbinaries’ in a DWH, that why I’m storing it as a CHAR(). I’m not sure whether VARBINARY is supported in all of Azure’s services (ML, etc.?).

Depending on the hashing algorithm, HASHBYTES() will return hashes with different sizes/lengths (bytes):



 


 


 

The VARBINARY values are:



 


 

Now…how to cast/convert it to CHAR()? On LinkedIn we had a short discussion about using CAST() or CONVERT(), since they output different strings. My advice would be to use CONVERT(), since it represents the actual VARBINARY-value (with or without the ‘0x’-prefix).

If you want to have the prefix included in the output string, use:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(1))

Personally I think that you can lose the prefix, because you can always concatenate it if needed. So:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(2))



 


 

Now when you’re converting the VARBINARY to a CHAR(xx), the size/length increases:



 


 

Conclusion

So basically it all come to:

Generate a SHA2_256 hash like SELECT CONVERT(CHAR(64),HASHBYTES('SHA2_256',@TESTVAL),(2))
Generate a SHA2_512 hash like SELECT CONVERT(CHAR(128),HASHBYTES('SHA2_512,@TESTVAL),(2))

Store a SHA2_256 hash as CHAR(64)
Store a SHA2_512 hash as CHAR(128)

Unless Microsoft decides to support MD5 or SHA1 within SQL Server vNext / HASHBYTES(), my advice will be to start using SHA2_256 for key hashes.

In case you don’t need to combine Azure services and just stick to a ‘normal’ DWH on SQL Server, storing a SHA2_256 as BINARY(32) would be the best choice. BINARY() as a datatype could not be supported in a specific service or tool.

Happy hashing! 🙂

 


 


 

How To: Send e-mails from your Windows Azure Virtual Machine (VM) to Office 365 (Exchange)

WindowsAzure

After a few hours of struggling, I found that (for now) the only method of sending e-mails from your SQL Server Instance (inside a Windows Azure Virtual Machine / Environment) is to setup a SMTP-service in IIS 6.0 (Yes I know, but SQL Server doesn’t support the use of ‘TLS encryption‘). Mohamed M Malek posted a step-by-step blogpost which will provide you a grant tour on how to set this up, click here.

I’ve tested this solution in a Windows Azure VM:
Windows Server 2012 Datacenter Edition (from the Virtual Machine Catalog in Windows Azure)
SQL Server 2012 Express Edition

Yes the Express Edition of SQL Server does indeed supports sending mails by using sp_send_dbmail(). More info on how to set that up: click here

Important settings:
Do not fill in any username or password details in Account:
@username=’tanmaya@mydomain.com’,
@password=’1qwe432′,

LocalHost will serve as an SMTP-server for the Account:
@mailserver_name=’localhost’

-> Don’t forget to stop and start the SMTP-service!

The Office 365 version I’m currently using is the ‘Office 365 for professionals and small businesses‘.

BumbleBI launched service provider: SQL Hotel – Shared and dedicated hosting of SQL Server databases and solutions

BumbleBI launched a new service provider called:
SQL Hotel – Shared and dedicated hosting of SQL Server databases and solutions.
 

 
Our servers are hosted in EasyNet’s datacenter at Schiphol-Rijk (Amsterdam, The Netherlands) which is directly connected to one of the world’s fastest internet hubs: Amsterdam Internet Exchange (AMS-IX).
 
Ideal for smaller databases without paying the expensive licensing costs.
Or for the large corporations for world-wide database access for free usage of Excel, PowerPivot, Power View and/or tablet apps.
 
Our Microsoft Business Intelligence (BI) consultants can support deploying your data to this cloud-environment (near realtime of by a daily import job).
 
The databases of these software suites can be made available in our SQL Hotel environment for analysis, reporting, data mining and more:
 

 
Interested? Check one of our websites:
SQLHotel.nl
SQLServerDatabaseHosting.com
 
Spread the word!


SQL Azure Billing Numbers Directly From Transact-SQL

Thanks to Karlo Bartels‘s comment on my previous post, here a detailed blogpost on Microsoft.com about the SQL Azure billing numbers/costs with the use of Dynamic Managed Views.

SQL Azure exposes two Dynamic Managed Views called sys.database_usage and sys.bandwidth_usage that show you the activity for your account. These can queried to understand your account usage from a billing perspective.

Database storage costs:

Bandwidth costs:

Download the code right here.

Thanks to Wayne Walter Berry and his post.

Playing around with SQL Azure

I’ve started playing with SQL Azure on my ’30 days free SQL Azure trial’ and already got a few tips:

Login on SQL Azure’s Developer Portal and create a database.

Don’t forget to modify your Firewall settings of SQL Azure in order to connect to it (Tutorial):
A simple statement does the trick (execute on Master DB):

exec sp_set_firewall_rule N'Allow Project Houston','0.0.0.0','0.0.0.0'

An overview of security in SQL Azure

Project ‘Houston’ is very handy! 🙂

I’ve uploaded AdventureWorksDW2008R2 partially (28,2 MB) with Create-scripts and a small SSIS package:

An overview of my tables within SQL Azure:

I can’t find any traffic stats though.

Next on my to-do-list:
Start using SQL Azure within a PowerPivot workbook (performance check).

I’ll keep you guys posted!