• Home
  • About
  • Blog
  • Speaking
  • Contact

How to achieve Ultimate PDW load speed (DWLOADER PART 2)

24 January 2014SQL StijnGeen categorie, MS PDWNo Comments

In previous posts I talked about DWLOADER and the distribution & replication theory, today I’m going to explain how you can combine this knowledge to achieve the ultimate PDW load speed with DWLoader.

There are 3 things that determine your load speed

  1. File size
  2. Destination Table Design
  3. Structure of the file

 

I will start off with giving you more information how file size will affect your load speed, f.e. you have got a Fact table that has about 1 000 000 000 rows, if you want to load this file straight into your PDW with DWloader, you will see that your load speed is not getting the results you expect while working with Infiniband. How did I solve this issue? The solution for this decrease in load speed is quite straight forward, you just use a file split program to split your file up in the desired number of rows, I tend to use files which contain 100 000 000 rows. You can achieve this with GSPLIT tool or other. After splitting these files I loaded them in parallel using a BATCH to max out the Infiniband.

Now once you have split up your file we can move on to the second thing that affects your load speed, namely your Table Design and more specifically your choice of Distribution Key. If you want more information on how to select a good Distribution Key I suggest you read my previous post (http://swyssql.wordpress.com/2014/01/21/pdw-distribution-replication/). If you have followed the four steps for selecting your distribution key, your load speed will not be affected by your table design, but when you notice that your load speed is a lot slower than expected it might be interesting to look at the key you chose.

The last and final part that defines your load speed is the structure of your file. As you know PDW achieves incredible results when it comes to processing unstructured data, however when you load data into your PDW that is structured it might cause problems to your load speed. The thing you have to avoid is choosing a distribution key which is structured in your source file. And this brings me to the most important conclusion I made while tuning your DWloader: Sometimes it might be better to load your data into your PDW using a staging table with a different distribution key then your ideal distribution key. Afterwards you can use the CTAS magic to transform your table, to a table with the best queryperforming distribution key. (ELT principle http://swyssql.wordpress.com/2013/12/13/basic-introduction-to-ms-pdw/ ).

If you keep these 3 hints I gave you in mind you should normally achieve your ultimate dwloader speed (1.3GBPS is what we achieved so far J ).

In future posts I will be talking about the Polybase feature from PDW and how you set up your very own Hadoop cluster to combine Big Data using External table commands.

Stay tuned!

Tags: DWLOADER, MICROSOFT, PDW, SQL

Related Articles

Loading Data with Dwloader & SSIS for MS PDW

24 February 2014SQL Stijn

PDW Distribution & Replication

21 January 2014SQL Stijn

Migration Gone Wrong! The Fallback Scenario

23 February 2016SQL Stijn

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Recent Posts

  • Building a Modern Lakehouse Data Warehouse with Azure Synapse Analytics: Moving your Database to the lake
  • Azure Synapse Dedicated SQL Pools: Workload Management
  • Synapse Spark – Playing with Data Frames with Python
  • Synapse Spark – Reading CSV files from Azure Data Lake Storage Gen 2 with Synapse Spark using Python
  • Azure Synapse Analytics – Serverless SQL Pools: Data Elimination

Recent Comments

    Archives

    • June 2022
    • December 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • August 2016
    • July 2016
    • March 2016
    • February 2016
    • December 2015
    • October 2015
    • August 2015
    • March 2015
    • December 2014
    • November 2014
    • July 2014
    • February 2014
    • January 2014
    • December 2013

    Categories

    • Azure Synapse Analytics
    • Building a Modern Lakehouse Data Warehouse
    • Geen categorie
    • MS PDW
    • SQL 2016
    • SQL Maintenance
    • SQL Server Migration
    • SQL Server Misconfiguration Chronicles
    • SQL2014
    • Troubleshooting

    Contact Stijn now


    © SQL Stijn 2017 - Privacy Policy - Cookie Policy
    Manage Cookie Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    Manage options Manage services Manage vendors Read more about these purposes
    View preferences
    {title} {title} {title}