Download high-quality sample data including latitude and longitude coordinates

Last night I stumbled upon SampleData.io, a website that offers sample datasets which you can use for testing or demo purposes. Which is quite unique, since proper sample data is rare to find.

Local first and last names are used to populate the datasets. All records contain real addresses and fictional e-mail addresses, telephone numbers (with local prefix) and IP addresses. Latitude and longitude coordinates (geolocations) are included, which correspond to the real addresses. And also credit card numbers, expiry dates, CVV’s and issuers are included. Credit card numbers are valid according to the Luhn-algorithm.

Here are a few examples, like for instance the global sample dataset:

Here is an overview of all unique addresses in Europe:

And last but not least, all unique addresses in the United States of America:

With 80 downloadable dataset, almost all Western countries are available as a download. Starting from 2.500 records up to 250.000 records with unique (real) addresses and its geolocation (latitude and longitude coordinates). Instant delivery via e-mail, including a receipt for business expenses.

Oh…and don’t forget to click ‘Preview‘ on the Download-page to download a small sample file in CSV format.

So get your Power BI reports ready for some testing! With these geolocations (latitude and longitude coordinates), you’re able to build nice geographical (demo) reports.

Don’t forget the applications for Data Science, since these dataset can be used to test your Machine Learning models. Think about Dynamic Pricing, Product Recommendations and Fraud Detection?

How to automatically export/backup the MDS model

I noticed that I was making backups of the MDS model manually, so it’s time to automate that in a basic but effective way 🙂
When using CMD, it’s always tricky to get the correct timestamp (which I’ll use in the output filename). If found a nice script on: https://stackoverflow.com/questions/1064557/creating-a-file-name-as-a-timestamp-in-a-batch-job

Normally when you would like to import or export/backup a MDS model, you’ll use these scripts:

CREATE:
CD C:\Program Files\Microsoft SQL Server\130\Master Data Services\Configuration
MDSModelDeploy createpackage -service MDS1 -model "Biml meta-data" -version "VERSION_1" -package "E:\MDS\BimlMetaData_20170804.pkg" -includedata

DEPLOY:
CD C:\Program Files\Microsoft SQL Server\130\Master Data Services\Configuration
MDSModelDeploy deployclone -package "E:\MDS\BimlMetaData_20170804.pkg" -service MDS1

 

To automate this, create a BATCH-file (*.bat) in your local ‘MDS backup directory’ (in my case ‘E:\MDS\’) and copy/paste the following script into it:

:: THIS CODE WILL DISPLAY A 2-DIGIT TIMESTAMP FOR USE IN APPENDING FILENAMES

:: CREATE VARIABLE %TIMESTAMP%

for /f "tokens=1-8 delims=.:/-, " %%i in ('echo exit^|cmd /q /k"prompt $D $T"') do (
for /f "tokens=2-4 delims=/-,() skip=1" %%a in ('echo.^|date') do (
set dow=%%i
set mm=%%j
set dd=%%k
set yy=%%l
set hh=%%m
set min=%%n
set sec=%%o
set hsec=%%p
)
)

:: ensure that hour is always 2 digits

if %hh%==0 set hh=00
if %hh%==1 set hh=01
if %hh%==2 set hh=02
if %hh%==3 set hh=03
if %hh%==4 set hh=04
if %hh%==5 set hh=05
if %hh%==6 set hh=06
if %hh%==7 set hh=07
if %hh%==8 set hh=08
if %hh%==9 set hh=09

:: assign timeStamp:
:: Add the date and time parameters as necessary - " yy-mm-dd-dow-min-sec-hsec "

set timeStamp=%yy%%mm%%dd%_%hh%%min%%sec%

:: --------- TIME STAMP DIAGNOSTICS -------------------------

:: Un-comment these lines to test output

:: echo dayOfWeek = %dow%
:: echo year = %yy%
:: echo month = %mm%
:: echo day = %dd%
:: echo hour = %hh%
:: echo minute = %min%
:: echo second = %sec%
:: echo hundredthsSecond = %hsec%
:: echo.
:: echo Hello!
:: echo Today is %dow%, %mm%/%dd%.
:: echo.
:: echo Your timestamp will look like this: %timeStamp%
:: echo.
:: echo.
:: echo.
:: pause

CD C:\Program Files\Microsoft SQL Server\130\Master Data Services\Configuration

MDSModelDeploy createpackage -service MDS1 -model "Biml meta-data" -version "VERSION_1" -package "E:\MDS\BimlMetaData_%timeStamp%.pkg" -includedata

 

IMPORTANT: Make sure that you change the ‘MDS backup directory’ in the code above.

Now create a scheduled task (standard Windows functionality) that executes the BATCH-file and runs either on a specific time and/or system startup:



 



 



 

 


 


 


 


 


 


 


 


 


 


 


 


 

Other handy scripts:
PowerShell: Delete old files based on a given retention period (days) #maintenance

Use PowerShell to script all database objects

 


 


 

SSIS 2016: Retrieving the ErrorDescription and ErrorColumnName

Since SQL Server 2016 there is a ‘built-in’ function to retrieve the actual column name of the column which caused the error. But there’s also a flaw, in case of a database constraint the ‘ErrorColumn’ (input parameter) will remain empty. So I’ve added an ELSE-statement to it in case it doesn’t receive a value:

How to set it up

1) Copy the script below

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);

IDTSComponentMetaData130 componentMetaData = this.ComponentMetaData as IDTSComponentMetaData130;

if (Row.ErrorColumn > 0)
{
Row.ErrorColumnName = componentMetaData.GetIdentificationStringByID(Row.ErrorColumn);
}
else
{
Row.ErrorColumnName = "Database constraint :)";
};
}


2) Paste it into the script window and save it


3) Select both ‘ErrorCode’ and ‘ErrorColumn’ as input parameters


4) Now add two new output columns as shown below

In my case both output column are of data type string (DT_STR) with a size 250.

5) Add a Derived Column to the Data Flow after the Script Component and use the following SSIS expression to replace the original output value of ‘ErrorColumnName’

REPLACE( REVERSE( SUBSTRING( REVERSE( ErrorColumnName ),1,FINDSTRING( REVERSE(ErrorColumnName) ,”[“,1) – 1) ),”]”,” “)

(The original output value also contains the component’s name and we’re just interested in the column name itself)

 

There you go!
Many thanks to my colleague Gigi for the SSIS Expression 🙂

 


 


 

Hashing within the Azure Data landscape

The research

I’ve did a small research about what’s the best option is for hashing (e.g. determining deltas or hashing concatenated natural keys). Up until now, the algorithm used for hashing is MD5. Please note that we’re not discussing the hashing of passwords. Want to read more? https://www.linkedin.com/feed/update/urn:li:activity:6295982490112925696

 

Which hashing algorithm should we use?

MD5 has a minimal performance impact and the output (VARBINARY or CHAR(32)) has the lowest storage footprint. But…as of SQL Server 2016, all hashing algorithms except SHA2_256 and SHA2_512 are deprecated:

“…Beginning with SQL Server 2016, all algorithms other than SHA2_256, and SHA2_512 are deprecated. Older algorithms (not recommended) will continue working, but they will raise a deprecation event…”

https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

 

Basically Microsoft is saying good bye to all other/older algorithms. When you look at the current Azure data landscape, you bet that for the near future you’re unable to continue using MD5 in your Data Warehouse or Data Mart (SQL Server 2017+, Azure SQL DB and Azure SQL DWH).

So… SHA2_256 it is for hashing keys. I guess for hashing passwords or other sensitive values, use SHA2_512 in combination with salt.

 

Wouldn’t that impact the (ETL) performance?

Yep, unfortunately. But please keep in mind that we just need to continue evolving and Microsoft is in charge 🙂 Just make sure that you only need to hash values once, e.g. in your staging/landing area.

 

How should we use it?

Within SQL Server, Azure SQL DWH or Azure SQL DB, you’ll probably use HASHBYTES() to calculate the hash and it returns a VARBINARY. Personally I’m not a big fan of storing ‘varbinaries’ in a DWH, that why I’m storing it as a CHAR(). I’m not sure whether VARBINARY is supported in all of Azure’s services (ML, etc.?).

Depending on the hashing algorithm, HASHBYTES() will return hashes with different sizes/lengths (bytes):



 


 


 

The VARBINARY values are:



 


 

Now…how to cast/convert it to CHAR()? On LinkedIn we had a short discussion about using CAST() or CONVERT(), since they output different strings. My advice would be to use CONVERT(), since it represents the actual VARBINARY-value (with or without the ‘0x’-prefix).

If you want to have the prefix included in the output string, use:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(1))

Personally I think that you can lose the prefix, because you can always concatenate it if needed. So:
SELECT CONVERT(CHAR(),HASHBYTES('',''),(2))



 


 

Now when you’re converting the VARBINARY to a CHAR(xx), the size/length increases:



 


 

Conclusion

So basically it all come to:

Generate a SHA2_256 hash like SELECT CONVERT(CHAR(64),HASHBYTES('SHA2_256',@TESTVAL),(2))
Generate a SHA2_512 hash like SELECT CONVERT(CHAR(128),HASHBYTES('SHA2_512,@TESTVAL),(2))

Store a SHA2_256 hash as CHAR(64)
Store a SHA2_512 hash as CHAR(128)

Unless Microsoft decides to support MD5 or SHA1 within SQL Server vNext / HASHBYTES(), my advice will be to start using SHA2_256 for key hashes.

In case you don’t need to combine Azure services and just stick to a ‘normal’ DWH on SQL Server, storing a SHA2_256 as BINARY(32) would be the best choice. BINARY() as a datatype could not be supported in a specific service or tool.

Happy hashing! 🙂

 


 


 

How to Encrypt and Decrypt Text in SQL Server (2008+)

Here’s a simple example on how to encrypt (and decrypt) values in SQL Server:

How to Encrypt and Decrypt Text in SQL Server

 

--How to Encrypt and Decrypt Text in SQL Server (2008+)
--Example by Clint Huijbers

DECLARE
@Key1 NVARCHAR(50) = N'8g87g8ag4r8hinsg^%$#F&^F&^F^F&ÛIBOUG(%*^R&$%&#%^C5'
,@Key2 NVARCHAR(50) = N'&*^%R%^**F$F*OUYBYUB*F%74d654d7f685f65f56f6v6vc88d'
,@ValueToEncrypt VARCHAR(150) = 'This is the value that will be encrypted by the built-in function of SQL Server with two keys and 128-bits encryption :) blablablablablablablablablabl'
,@EncryptedValue VARBINARY(350) --Increase the VARBINARY size when you increase the lengh of @ValueToEncrypt!

SELECT @EncryptedValue = ENCRYPTBYPASSPHRASE((@Key1+N'||'+@Key2),@ValueToEncrypt)
SELECT @EncryptedValue

SELECT CONVERT(VARCHAR(150),DECRYPTBYPASSPHRASE((@Key1+N'||'+@Key2),@EncryptedValue))

 


 


 

SQL Table/CTE for SQL Server, SSIS and Biml data types

Cathrine Wilhelmsen posted an easy lookup table for the data types within SQL Server, SSIS and Biml.

For loading the meta-data information based on source tables, I find it handy to provide all three data types in the result set of the query. So here’s an easy Common Table Expression / Table Value Constructor which might help you also.


WITH DataTypeMatrix (DataTypeSQL, DataTypeSSIS, DataTypeBiml)
AS
(
SELECT *
FROM
(
VALUES
('bigint','DT_I8','Int64')
,('binary','DT_BYTES','Binary')
,('bit','DT_BOOL','Boolean')
,('char','DT_STR','AnsiStringFixedLength')
,('date','DT_DBDATE','Date')
,('datetime','DT_DBTIMESTAMP','DateTime')
,('datetime2','DT_DBTIMESTAMP2','DateTime2')
,('datetimeoffset','DT_DBTIMESTAMPOFFSET','DateTimeOffset')
,('decimal','DT_NUMERIC','Decimal')
,('float','DT_R8','Double')
,('geography','DT_IMAGE','Object')
,('geometry','DT_IMAGE','Object')
,('hierarchyid','DT_BYTES','Object')
,('image','DT_IMAGE','Binary')
,('int','DT_I4','Int32')
,('money','DT_CY','Currency')
,('nchar','DT_WSTR','StringFixedLength')
,('ntext','DT_NTEXT','String')
,('numeric','DT_NUMERIC','Decimal')
,('nvarchar','DT_WSTR','String')
,('real','DT_R4','Single')
,('rowversion','DT_BYTES','Binary')
,('smalldatetime','DT_DBTIMESTAMP','DateTime')
,('smallint','DT_I2','Int16')
,('smallmoney','DT_CY','Currency')
,('sql_variant','DT_WSTR','Object')
,('text','DT_TEXT','AnsiString')
,('time','DT_DBTIME2','Time')
,('timestamp','DT_BYTES','Binary')
,('tinyint','DT_UI1','Byte')
,('uniqueidentifier','DT_GUID','Guid')
,('varbinary','DT_BYTES','Binary')
,('varchar','DT_STR','AnsiString')
,('xml','DT_NTEXT','Xml')
) AS F (DataTypeSQL, DataTypeSSIS, DataTypeBiml)
)

SELECT *
FROM DataTypeMatrix




Need to re-format the T-SQL code? Use the Notepad++ plugin ‘Poor Man’s T-Sql Formatter

 


 


 

MDS: How to create or deploy a package from a model

A few lines I need to remember for MDS 🙂

CREATE:

CD C:\Program Files\Microsoft SQL Server\130\Master Data Services\Configuration

MDSModelDeploy createpackage -service MDS1 -model “Biml meta-data” -version “VERSION_1” -package “C:\MDS\BimlMetaData_20170614.pkg” -includedata

DEPLOY:

CD C:\Program Files\Microsoft SQL Server\130\Master Data Services\Configuration

MDSModelDeploy deployclone -package “C:\MDS\BimlMetaData_20170614.pkg” -service MDS1

MDS Create package from model

SSMS Database Diagram – backend version is not supported

‘…backend version is not supported…’ was a new one for me. But when you think about it, you can’t use an ‘older’ version of SSMS to create a database diagram of a database on an SQL Server Instance that is a newer version. One option would be to download the latest stand-alone version of SSMS (which will also automatically update itself btw!).

Download SQL Server Management Studio (SSMS)
https://msdn.microsoft.com/en-us/library/mt238290.aspx

SSMS Database Diagram - backend version is not supported








SQL Server 2014 Developer Edition is now a free download

As of the 31st of March, SQL Server 2014 Developer Edition is now a free download:
https://myprodscussu1.app.vssubscriptions.visualstudio.com/Downloads?pid=1682

SQL Server 2014 Dev Download VS

Quote from MSDN:

SQL Server 2014 Developer Edition is now a free download for Visual Studio Dev Essentials members (you will be prompted to sign in to Visual Studio Dev Essentials before you can download SQL Server 2014 Developer Edition). We are making this change so that all developers can leverage the capabilities that SQL Server 2014 has to offer for their data solution, and this is another step in making SQL Server more accessible. SQL Server Developer Edition is for development and testing only, and not for production environments or for use with production data.

Visual Studio Dev Essentials is Microsoft’s most comprehensive free developer program ever, with everything you need to build and deploy your app on any platform, including state-of-the-art tools, the power of the cloud, training, and support.

Ohh….and this is also a nice one:

SQL Server 2016 Developer Edition, when released later this year, will also be free.

Read the full post on the SQL Server Blog: https://blogs.technet.microsoft.com/dataplatforminsider/2016/03/31/microsoft-sql-server-developer-edition-is-now-free/

 

Evaluation Editions:
Just want to download an evaluation edition?:
Microsoft® SQL Server® 2014 & 2016 Evaluation
Microsoft® SQL Server® 2012 Evaluation

SQL Server 2014 Dev Download