• Home
  • About
  • Blog
  • Speaking
  • Contact

MS PDW

Home MS PDW

Since I have been working with a Parallel Data Warehouse during the past month, it was only logical that my next post would be on this subject. Because a PDW is a relatively new product from Microsoft, I thought an introduction post would be in order. In this post I’ll give you a small amount of info about the hardware/architecture a PDW uses, and the new concept of ELT vs ETL

  1. The hardware!

    The PDW uses the INFINIBAND network connection, this great new piece of technology ensures about 25gbps, which makes loads and data traffic of the PDW lightning fast! With this new technology you will be able to upload giant amounts of data in your PDW in minutes instead of hours!

    The PDW uses MPP, which means he splits up the work he is doing under different nodes, in my case, this is on 2 compute and 1 control node (which have their own server/processors/cores). (Quarter PDW) This kind of architecture makes your (complex) queries 10 times faster. Check the table for the different options of PDW configuration. (Source = http://saldeloera.wordpress.com/2012/07/09/lesson-1-of-parallel-data-warehouse-basic-architecture-overview/)

Configuration Servers Processors Cores Space (Tb)
HP PDW  Full Rack 17 22 132 125
HP PDW  Full Rack with 4 Data Racks 47 82 492 500
HP PDW  Half Rack 11 8 48 15-60 (Optional disc sizes   available)

Since I’m not a real hardware expert I will not be going in deeper on this subject, if you want to know more about the MPP and the hardware PDW uses I suggest you read the blog post from Mark Stacey on http://markgstacey.net/2013/07/07/introduction-to-pdw-parallel-data-warehouse/, this was very helpful for me when I started working on PDW since it explains how it actually works, and this helps you understand how the PDW does the things it does so fast!
You can also find different pdf’s from Microsoft on this subject.

  1. ELT , the new ETL!

    PDW can easily load large volumes of data into your Data warehouse, that is why the basic ETL process might not be the best option for the PDW.
    In most systems today, the fastest method for data manipulation/integration is to extract your data, then transform it and then load it into you Data Warehouse. This is not the case for a PDW, because of its incredible load speed on unstructured unaltered data.
    This is why for PDW we choose to use ELT, so we first extract the data, then load it into our PDW, and then start transforming it on the PDW using CTAS statements (Create Table as Select à more info in a future blog post). Due to the MPP, these CTAS statements are a very fast process.
    Using ELT will prove to be a way better method for your Data manipulation/integration while using PDW, it is way less time-consuming then the normal ETL process!

In my future post I will become a lot more technical, explaining different user(My own fault J ) errors I encountered while configuring my first PDW, these are easy mistakes to make and once you know about them you will never make them again. The next post will be explaining all the secrets that the DWLOADER.exe has.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.


Contact Stijn now


© SQL Stijn 2017 - Privacy Policy - Cookie Policy
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}