• Home
  • About
  • Blog
  • Speaking
  • Contact

Load your data into your PDW (part 1 the DWLOADER)

7 January 2014SQL StijnGeen categorie, MS PDWNo Comments

Today I will enlighten you about the incredible fast load times which you will get with a Microsoft Parallel Data Warehouse!

The first thing you have to do when you work with a MSPDW is load your data into the PDW. This initial load will be the most “time consuming” so with this post I hope to help you avoid common mistakes which I made myself with the DWLOADER statement.

The first way to insert data is by developing SSIS packages and then inserting the data into the PDW using the package, I will not be going in deeper on this subject since I prefer working with the second way.

The second way is an exclusive feature for PDW: DWLOADER.EXE. This Command Line based loading tool will ensure the fastest loading times using the infiniband connection. There are many properties to configure DWLOADER, and in this post I will explain how changing these properties will make your load a success and affect your load speeds.

I will start off with giving you guys a sample dwloader statement and from this statement I’ll explain the different values you can configure to increase your loading speed, and I’ll explain the key properties of the DWLOADER (without these properties your load will fail!)

  •  dwloader.exe -M fastappend -m -b 2000000 -i FILEPATH\FILE -T TABLE -R FILEPATH\FILENAME.bad -t \t -r \r\n -dt FILEPATH\dateformatDR.txt -E -S 172.16.254.5 -U BatchUser -P B@tch1234 > “FILEPATH\DWLOADLOGvx.LOG”

  1. Dwloader.exe : this is used to call the dwloader program
  2. –M Fastappend: using fastappend will increase your loadrate significantly! Other options are Append Upsert & reload.
  3. –b: this will define after how many rows you want to insert this load in the columnstore table. It is better to split this up using a custom batch size to avoid tupple mover(tupple mover = stores data in rowstore to dump them afterwards in columnstore) . So it might be interesting to tweak your batch size so that you don’t trigger this rowstore and directly insert into the columnstore.

  4. –i: The path of the file you want to load into your PDW (Best on a location which is connected to the Infiniband to avoid the bottleneck of your ethernet connection)
  5. –T: Is the destination table created in your PDW in which DWLOADER will insert the data.
  6. –R: is the destinationpath were you will keep your log file if something goes wrong with your load (Values that can’t be loaded f.e. wrong dateformat)
  7. –t: is the delimiter which is used in the file (in the case of the sample statement above this is /t which means it’s a tab delimited file)
  8. \r: will define the end of your row. This is defined by \n in the example file which is enter.
  9. –D: If you don’t have multiple dateformats in your file, you can define the default format for your date fields by using the –D property f.e. –D “DD-mm-YYYY”
  10. –dt: If you have multiple date formats in your file you will have to add the format for every datefield in a txt file in which you put the name of the field followed by the format it uses.
    f.e. Birthdate= yyyy-MM-dd
    Startdate= yyyy-MM-dd HH:mm:ss.fff

  11. –S: will define which connection will be used to transfer the data from your flat file to your PDW, be sure to check the IP’s of your Ethernet & Infiniband connection. Using the IP-address of your Infiniband will ensure much faster load speeds
  12. –E: if you have empty values in your Flat File you will have to use –E which will convert these empty values to NULL. Without this property your load will fail!
  13. –U & -P: will define your username and password for your PDW
  14. For additional logging you will have to use > this way you can monitor your load speed a lot better and spot problems a lot faster!

This was an example of a sample DWLOADER statement, however this statement does not guarantee the best loading speed. This is defined by a combination of your dwloader configuration / flat file configuration / Table design. In my next post I will be going in deeper on table design in a PDW explaining the Distribution & Replication theory and how this will affect your load speed.

Tags: DWLOADER, MICROSOFT, PDW, SQL

Related Articles

Openrowset across the CMS: The Proxy Way of Life

20 March 2015SQL Stijn

THE SQL SERVER MISCONFIGURATION CHRONICLES: PART 2 Instance Memory Settings

17 December 2015SQL Stijn

Migration Gone Wrong! The Fallback Scenario

23 February 2016SQL Stijn

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Recent Posts

  • Building a Modern Lakehouse Data Warehouse with Azure Synapse Analytics: Moving your Database to the lake
  • Azure Synapse Dedicated SQL Pools: Workload Management
  • Synapse Spark – Playing with Data Frames with Python
  • Synapse Spark – Reading CSV files from Azure Data Lake Storage Gen 2 with Synapse Spark using Python
  • Azure Synapse Analytics – Serverless SQL Pools: Data Elimination

Recent Comments

    Archives

    • June 2022
    • December 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • August 2016
    • July 2016
    • March 2016
    • February 2016
    • December 2015
    • October 2015
    • August 2015
    • March 2015
    • December 2014
    • November 2014
    • July 2014
    • February 2014
    • January 2014
    • December 2013

    Categories

    • Azure Synapse Analytics
    • Building a Modern Lakehouse Data Warehouse
    • Geen categorie
    • MS PDW
    • SQL 2016
    • SQL Maintenance
    • SQL Server Migration
    • SQL Server Misconfiguration Chronicles
    • SQL2014
    • Troubleshooting

    Contact Stijn now


    © SQL Stijn 2017 - Privacy Policy - Cookie Policy
    Manage Cookie Consent
    To provide the best experiences, we use technologies like cookies to store and/or access device information.
    Functional Always active
    The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
    Preferences
    The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
    Statistics
    The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
    Marketing
    The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
    Manage options Manage services Manage vendors Read more about these purposes
    View preferences
    {title} {title} {title}