Hashing within the Azure Data landscape

The research

I’ve did a small research about what’s the best option is for hashing (e.g. determining deltas or hashing concatenated natural keys). Up until now, the algorithm used for hashing is MD5. Please note that we’re not discussing the hashing of passwords. Want to read more? https://www.linkedin.com/feed/update/urn:li:activity:6295982490112925696

 

Which hashing algorithm should we use?

MD5 has a minimal performance impact and the output (VARBINARY or CHAR(32)) has the lowest storage footprint. But…as of SQL Server 2016, all hashing algorithms except SHA2_256 and SHA2_512 are deprecated:

“…Beginning with SQL Server 2016, all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event…”

https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

 

Basically Microsoft is saying good bye to all other/older algorithms. When you look at the current Azure data landscape, you bet that for the near future you’re unable to continue using MD5 in your Data Warehouse or Data Mart (SQL Server 2017+, Azure SQL DB and Azure SQL DWH).

So… SHA2_256 it is for hashing keys. I guess for hashing passwords or other sensitive values, use SHA2_512 in combination with salt.

 

Wouldn’t that impact the (ETL) performance?

Yep, unfortunately. But please keep in mind that we just need to continue evolving and Microsoft is in charge 🙂 Just make sure that you only need to hash values once, e.g. in your staging/landing area.

 

How should we use it?

Within SQL Server, Azure SQL DWH or Azure SQL DB, you’ll probably use HASHBYTES() to calculate the hash and it returns a VARBINARY. Personally I’m not a big fan of storing ‘varbinaries’ in a DWH, that why I’m storing it as a CHAR(). I’m not sure whether VARBINARY is supported in all of Azure’s services (ML, etc.?).

Depending on the hashing algorithm, HASHBYTES() will return hashes with different sizes/lengths (bytes):



 


 


 

The VARBINARY values are:



 


 

Now…how to cast/convert it to CHAR()? On LinkedIn we had a short discussion about using CAST() or CONVERT(), since they output different strings. My advice would be to use CONVERT(), since it represents the actual VARBINARY-value (with or without the ‘0x’-prefix).

If you want to have the prefix included in the output string, use:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(1))

Personally I think that you can lose the prefix, because you can always concatenate it if needed. So:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(2))



 


 

Now when you’re converting the VARBINARY to a CHAR(xx), the size/length increases:



 


 

Conclusion

So basically it all come to:

Generate a SHA2_256 hash like SELECT CONVERT(CHAR(64),HASHBYTES('SHA2_256',@TESTVAL),(2))
Generate a SHA2_512 hash like SELECT CONVERT(CHAR(128),HASHBYTES('SHA2_512,@TESTVAL),(2))

Store a SHA2_256 hash as CHAR(64)
Store a SHA2_512 hash as CHAR(128)

Unless Microsoft decides to support MD5 or SHA1 within SQL Server vNext / HASHBYTES(), my advice will be to start using SHA2_256 for key hashes.

In case you don’t need to combine Azure services and just stick to a ‘normal’ DWH on SQL Server, storing a SHA2_256 as BINARY(32) would be the best choice. BINARY() as a datatype could not be supported in a specific service or tool.

Happy hashing! 🙂

 


 


 

Advertisements

How to Encrypt and Decrypt Text in SQL Server (2008+)

Here’s a simple example on how to encrypt (and decrypt) values in SQL Server:

How to Encrypt and Decrypt Text in SQL Server

 

--How to Encrypt and Decrypt Text in SQL Server (2008+)
--Example by Clint Huijbers

DECLARE
@Key1 NVARCHAR(50) = N'8g87g8ag4r8hinsg^%$#F&^F&^F^F&ÛIBOUG(%*^R&$%&#%^C5'
,@Key2 NVARCHAR(50) = N'&*^%R%^**F$F*OUYBYUB*F%74d654d7f685f65f56f6v6vc88d'
,@ValueToEncrypt VARCHAR(150) = 'This is the value that will be encrypted by the built-in function of SQL Server with two keys and 128-bits encryption :) blablablablablablablablablabl'
,@EncryptedValue VARBINARY(350) --Increase the VARBINARY size when you increase the lengh of @ValueToEncrypt!

SELECT @EncryptedValue = ENCRYPTBYPASSPHRASE((@Key1+N'||'+@Key2),@ValueToEncrypt)
SELECT @EncryptedValue

SELECT CONVERT(VARCHAR(150),DECRYPTBYPASSPHRASE((@Key1+N'||'+@Key2),@EncryptedValue))

 


 


 

SQL Server 2014 Developer Edition is now a free download

As of the 31st of March, SQL Server 2014 Developer Edition is now a free download:
https://myprodscussu1.app.vssubscriptions.visualstudio.com/Downloads?pid=1682

SQL Server 2014 Dev Download VS

Quote from MSDN:

SQL Server 2014 Developer Edition is now a free download for Visual Studio Dev Essentials members (you will be prompted to sign in to Visual Studio Dev Essentials before you can download SQL Server 2014 Developer Edition). We are making this change so that all developers can leverage the capabilities that SQL Server 2014 has to offer for their data solution, and this is another step in making SQL Server more accessible. SQL Server Developer Edition is for development and testing only, and not for production environments or for use with production data.

Visual Studio Dev Essentials is Microsoft’s most comprehensive free developer program ever, with everything you need to build and deploy your app on any platform, including state-of-the-art tools, the power of the cloud, training, and support.

Ohh….and this is also a nice one:

SQL Server 2016 Developer Edition, when released later this year, will also be free.

Read the full post on the SQL Server Blog: https://blogs.technet.microsoft.com/dataplatforminsider/2016/03/31/microsoft-sql-server-developer-edition-is-now-free/

 

Evaluation Editions:
Just want to download an evaluation edition?:
Microsoft® SQL Server® 2014 & 2016 Evaluation
Microsoft® SQL Server® 2012 Evaluation

SQL Server 2014 Dev Download

 


 


 


 

Pause SQL Server service before restarting

Did you know that by pausing the SQL Server service before restarting the instance we allow end users to continue their work uninterrupted and we also stop any new connections to the instance? This is a nicer way of telling people to “get out” of the database in order for the server to be rebooted. I wouldn’t leave the server paused for 60 minutes of course, but I would rather use this method than forcibly disconnect users and rollback their transactions.

When a server is paused you will see messages similar to this in the SQL Server error log:

Error: 17142, Severity: 14, State: 0.
SQL Server service has been paused. No new connections will be allowed. To resume the
service, use SQL Computer Manager or the Services application in Control Panel.

Error: 18456, Severity: 14, State: 13.
Login failed for user ''. Reason: SQL Server service is paused.
No new connections can be accepted at this time. [CLIENT: ]

Next time you are worried about rebooting during the day think about the pause button instead. It might be a nice compromise for your end-users.

Thank Thomas Larock for his blog post: http://thomaslarock.com/2016/01/pause-sql-server-service-before-restarting/








SQL Server Window Functions Cheat Sheet (OVER-clause)

It all started with SQL Server 2005 which introduced the OVER-clause. It was (back then) possible to add a row number to a dataset by using something similair to:
ROW_NUMBER() OVER()

You were also able to use the Ranking functions, such as ROW_NUMBER, RANK, DENSE_RANK and NTILE.

Now with SQL Server 2012, Microsoft introduced a few more so called ‘Window functions’ (aggregate and analytic):

SQL Server Window Function OVER clause


Here’s an overview:
sql-server-2012-window-functions-over-clause-cheat-sheet-by-clint-huijbers


Download:
Download my cheat sheet here (in Word format)








FORMAT() function in SQL Server 2012/2014

The FORMAT() function is one to really remember, it saves you a lot of time.
Here are a few examples:


SELECT
FORMAT(12345678,'n') AS n
,FORMAT(12345678,'n0') AS n0
,FORMAT(12345678,'n0','nl-NL') AS n0NL
,FORMAT(0.8856,'p') AS p
,FORMAT(0.8856,'p1') AS p1
,FORMAT(12345678,'c') AS c
,FORMAT(12345678,'c1') AS c1
,FORMAT(12345678,'c1','nl-NL') AS c1NL

DECLARE @d DATETIME = '10/01/2011'

SELECT FORMAT ( @d, 'D', 'en-US' ) AS 'US English Result'
,FORMAT ( @d, 'D', 'en-gb' ) AS 'Great Britain English Result'
,FORMAT ( @d, 'D', 'de-de' ) AS 'German Result'
,FORMAT ( @d, 'D', 'zh-cn' ) AS 'Chinese (Simplified PRC) Result';



sql-server-2012-format-function-examples-with-regional-code



For more examples, please visit:
http://www.codeproject.com/Articles/610089/SQL-Servers-FORMAT-function








SSIS: Extract package information (e.g. DataFlowTaskName, TaskDescription, PrecedenceConstraint, etc.)

Found this great blog post from Saravanan:
http://sqlmall.wordpress.com/2014/05/29/extract-ssis-2012-package-data/

He describes a way to search through the XML coding of your SSIS packages, to for example, look for a certain DataFlow which is called ‘RAW_DWH_Main‘.

Step 1) Enable FILESTREAM via SQL Server Configuration Manager

http://sqlserverscribbles.com/2013/07/09/how-to-enable-and-configure-filestream-in-sql-server-2008-2012/
http://msdn.microsoft.com/en-us/library/cc645923.aspx

Step 2) Create the FileTable (and a new database)
NOTE: Make sure to fill in an existing path for the directory.


EXEC sp_configure filestream_access_level, 2
RECONFIGURE
GO

CREATE DATABASE PackageDefinition
ON PRIMARY
(Name = PackageDefinition,
FILENAME = 'E:\FileTableDB\PackageDefinition.mdf'),
FILEGROUP FTFG CONTAINS FILESTREAM
(NAME = Package,
FILENAME='E:\FileTableDB\PackageDefinition')
LOG ON (Name = PackageDefinitionLog,
FILENAME = 'E:\FileTableDB\PackageDefinitionLog.ldf')
WITH FILESTREAM (NON_TRANSACTED_ACCESS = FULL,
DIRECTORY_NAME = N'PackageDefinition');
GO

USE PackageDefinition;
CREATE SCHEMA FT
GO

CREATE TABLE FT.PackageDefinition AS FileTable
WITH
(FileTable_Directory = 'PackageDirectory');
GO

Here’s your new FileTable:

Search-in-ssis-packages-with-sql-filetable-structure


Explore the directory and copy/paste your SSIS packages here

Search-in-ssis-packages-with-sql-filetable-explore-directory


Now copy (or perhaps move/relocate) your TFS to this directory.

Almost done…run the query!

;WITH XMLNAMESPACES
(
'www.microsoft.com/SqlServer/Dts' AS DTS
),

PackageDefinition
AS
( /*Query the file content([file_stream]) from filetable*/
SELECT *
,FileTableRootPath() AS RootPath
,file_stream.GetFileNamespacePath() as FilePath
,path_locator.GetLevel() AS PathLevel
,PackageData = CAST(CAST([file_stream] AS VARBINARY(MAX)) AS XML )
FROM [FT].[PackageDefinition]
WHERE file_type = 'dtsx' --SSIS Packages
--AND Name LIKE '%RAW_DWH_Main%' --FileName
AND file_stream.GetFileNamespacePath() LIKE '\PackageDirectory\DWH.SSIS\%' --Project folder
AND path_locator.GetLevel() = 2

)

SELECT r.*
FROM
(
SELECT
RootPath
,FilePath
,PathLevel
,PackageName = Exe.pkg.value ('@DTS:ObjectName' ,'VARCHAR(250)')
,CreationDate = Exe.pkg.value ('@DTS:CreationDate' ,'DATETIME2(0)')
,CreatorName = Exe.pkg.value ('@DTS:CreatorName' ,'VARCHAR(250)')
,DataFlowTaskName = Task.pkg.value ('@DTS:Description' ,'VARCHAR(250)')
,TaskDescription = Task.pkg.value ('@DTS:ObjectName' ,'VARCHAR(250)')
,PrecedenceConstraint = '['+REPLACE((pc.pkg.value ('@DTS:From','VARCHAR(250)')),'Package\','')+'] ' +'---->' + '['+REPLACE((pc.pkg.value ('@DTS:To' ,'VARCHAR(250)')),'Package\','') +']'
,refId = Con.pkg.value ('@DTS:refId' ,'VARCHAR(250)')
,ConnectionProvider = Con.pkg.value ('@DTS:CreationName' ,'VARCHAR(250)')
,ConnectionManagerName = Con.pkg.value ('@DTS:ObjectName' ,'VARCHAR(250)')
,ConnectionString = Par.pkg.value ('@DTS:ObjectName' ,'VARCHAR(250)')
,ParameterName = CM.pkg.value ('@DTS:ConnectionString' ,'VARCHAR(250)')
,ParameterValue = Par.pkg.value ('.' ,'VARCHAR(250)')
,VariableName = Var.pkg.value ('@DTS:ObjectName' ,'VARCHAR(250)')
,VariableValue = Var.pkg.value ('.' ,'VARCHAR(250)')
,PackageData
FROM PackageDefinition
OUTER APPLY PackageData.nodes('/DTS:Executable') AS Exe(pkg)
OUTER APPLY Exe.pkg.nodes('DTS:Executables/DTS:Executable') AS Task(pkg)
OUTER APPLY Exe.pkg.nodes('DTS:PackageParameters/DTS:PackageParameter') AS Par(pkg) --Will only return data while using the 'Package Deployment'-model I guess.
OUTER APPLY Exe.pkg.nodes('DTS:Variables/DTS:Variable') AS Var(pkg)
OUTER APPLY Exe.pkg.nodes('DTS:ConnectionManagers/DTS:ConnectionManager/DTS:ObjectData/DTS:ConnectionManager') AS CM(pkg)
OUTER APPLY Exe.pkg.nodes('DTS:ConnectionManagers/DTS:ConnectionManager') AS Con(pkg)
OUTER APPLY Exe.pkg.nodes('DTS:PrecedenceConstraints/DTS:PrecedenceConstraint') AS pc(pkg)
) AS r
WHERE r.DataFlowTaskName LIKE '%MyDataFlow%'
ORDER BY FilePath



The results:

Search-in-ssis-packages-with-sql-filetable-xml-query-results


Important note:
The query doesn’t return any nested tasks, you may need to use a LIKE-operator for that:
WHERE CAST(PackageData AS NVARCHAR(MAX)) LIKE '%%'

If you have a solution for this, please let me know or post the solution in the comments. Thanks!


More info about FileTables:
http://blog.tallan.com/2012/01/01/its-a-file-system-its-a-database-table-its-sql-server-denali-filetable/

How-filetables-work-more-info-sql-server