• Home
  • About
  • Blog
  • Speaking
  • Contact

Basic introduction to MS PDW

13 December 2013SQL StijnGeen categorie, MS PDWNo Comments

Since I have been working with a Parallel Data Warehouse during the past month, it was only logical that my next post would be on this subject. Because a PDW is a relatively new product from Microsoft, I thought an introduction post would be in order. In this post I’ll give you a small amount of info about the hardware/architecture a PDW uses, and the new concept of ELT vs ETL

  1. The hardware!

    The PDW uses the INFINIBAND network connection, this great new piece of technology ensures about 25gbps, which makes loads and data traffic of the PDW lightning fast! With this new technology you will be able to upload giant amounts of data in your PDW in minutes instead of hours!

    The PDW uses MPP, which means he splits up the work he is doing under different nodes, in my case, this is on 2 compute and 1 control node (which have their own server/processors/cores). (Quarter PDW) This kind of architecture makes your (complex) queries 10 times faster. Check the table for the different options of PDW configuration. (Source = http://saldeloera.wordpress.com/2012/07/09/lesson-1-of-parallel-data-warehouse-basic-architecture-overview/)

Configuration Servers Processors Cores Space (Tb)
HP PDW  Full Rack 17 22 132 125
HP PDW  Full Rack with 4 Data Racks 47 82 492 500
HP PDW  Half Rack 11 8 48 15-60 (Optional disc sizes   available)
  1. ELT , the new ETL!

    PDW can easily load large volumes of data into your Data warehouse, that is why the basic ETL process might not be the best option for the PDW.
    In most systems today, the fastest method for data manipulation/integration is to extract your data, then transform it and then load it into you Data Warehouse. This is not the case for a PDW, because of its incredible load speed on unstructured unaltered data.
    This is why for PDW we choose to use ELT, so we first extract the data, then load it into our PDW, and then start transforming it on the PDW using CTAS statements (Create Table as Select à more info in a future blog post). Due to the MPP, these CTAS statements are a very fast process.
    Using ELT will prove to be a way better method for your Data manipulation/integration while using PDW, it is way less time-consuming then the normal ETL process!

In my future post I will become a lot more technical, explaining different user(My own fault J ) errors I encountered while configuring my first PDW, these are easy mistakes to make and once you know about them you will never make them again. The next post will be explaining all the secrets that the DWLOADER.exe has.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Recent Posts

  • Building a Modern Lakehouse Data Warehouse with Azure Synapse Analytics: Moving your Database to the lake
  • Azure Synapse Dedicated SQL Pools: Workload Management
  • Synapse Spark – Playing with Data Frames with Python
  • Synapse Spark – Reading CSV files from Azure Data Lake Storage Gen 2 with Synapse Spark using Python
  • Azure Synapse Analytics – Serverless SQL Pools: Data Elimination

Recent Comments

    Archives

    • June 2022
    • December 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • August 2016
    • July 2016
    • March 2016
    • February 2016
    • December 2015
    • October 2015
    • August 2015
    • March 2015
    • December 2014
    • November 2014
    • July 2014
    • February 2014
    • January 2014
    • December 2013

    Categories

    • Azure Synapse Analytics
    • Building a Modern Lakehouse Data Warehouse
    • Geen categorie
    • MS PDW
    • SQL 2016
    • SQL Maintenance
    • SQL Server Migration
    • SQL Server Misconfiguration Chronicles
    • SQL2014
    • Troubleshooting

    Contact Stijn now


    © SQL Stijn 2017 - Privacy Policy - Cookie Policy
    Manage Cookie Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    Manage options Manage services Manage vendors Read more about these purposes
    View preferences
    {title} {title} {title}