Basic introduction to MS PDW

13 December 2013SQL StijnGeen categorie, MS PDWNo Comments

Since I have been working with a Parallel Data Warehouse during the past month, it was only logical that my next post would be on this subject. Because a PDW is a relatively new product from Microsoft, I thought an introduction post would be in order. In this post I’ll give you a small amount of info about the hardware/architecture a PDW uses, and the new concept of ELT vs ETL

The hardware!

The PDW uses the INFINIBAND network connection, this great new piece of technology ensures about 25gbps, which makes loads and data traffic of the PDW lightning fast! With this new technology you will be able to upload giant amounts of data in your PDW in minutes instead of hours!

The PDW uses MPP, which means he splits up the work he is doing under different nodes, in my case, this is on 2 compute and 1 control node (which have their own server/processors/cores). (Quarter PDW) This kind of architecture makes your (complex) queries 10 times faster. Check the table for the different options of PDW configuration. (Source = http://saldeloera.wordpress.com/2012/07/09/lesson-1-of-parallel-data-warehouse-basic-architecture-overview/)

Configuration	Servers	Processors	Cores	Space (Tb)
HP PDW Full Rack	17	22	132	125
HP PDW Full Rack with 4 Data Racks	47	82	492	500
HP PDW Half Rack	11	8	48	15-60 (Optional disc sizes available)

ELT , the new ETL!

PDW can easily load large volumes of data into your Data warehouse, that is why the basic ETL process might not be the best option for the PDW.
In most systems today, the fastest method for data manipulation/integration is to extract your data, then transform it and then load it into you Data Warehouse. This is not the case for a PDW, because of its incredible load speed on unstructured unaltered data.
This is why for PDW we choose to use ELT, so we first extract the data, then load it into our PDW, and then start transforming it on the PDW using CTAS statements (Create Table as Select à more info in a future blog post). Due to the MPP, these CTAS statements are a very fast process.
Using ELT will prove to be a way better method for your Data manipulation/integration while using PDW, it is way less time-consuming then the normal ETL process!

In my future post I will become a lot more technical, explaining different user(My own fault J ) errors I encountered while configuring my first PDW, these are easy mistakes to make and once you know about them you will never make them again. The next post will be explaining all the secrets that the DWLOADER.exe has.

Basic introduction to MS PDW

Leave a Reply Cancel reply