Results 1 to 4 of 4

Thread: Can someone help me with this problem I am trying to solve, please? 3 questions.

  1. #1
    Join Date
    Aug 2013
    Posts
    2

    Can someone help me with this problem I am trying to solve, please? 3 questions.

    There are 3 million records in the sample file below and it is 3 gb in size (i.e. - unable to be opened in a standard text editor).
    HDR FILE VERSION 1.0
    D|35|07/11/1997|John|Smith|D||||06/10/1964|39:4-85|Improper Passing|||||1906||Franklin|||
    D|35|07/11/1997|Emma|Franks|G||||07/5/1951|39:3d-9|Speeding|||||1906||Bergen|||
    D|35|07/11/1997|Ed|Jobs|||||10/2/1982|39:4-56c|StopSign|||||1906||Ocean|||
    ……….
    ……….
    TRL 3,000,000 RECORDS

    Can you Programmatically demonstrate to me how to complete the following tasks?:
    1. Create 2 separate files from this single file, whereby one new file has all the records containing the word Franklin and the second file has everything else
    2. Validate that the total number of lines in the original file is equal to the total number of lines contained in both new files created in step #1 above.
    3. Add one additional pipe (“|”) to the end of each line in the file

  2. #2
    Join Date
    Nov 2002
    Location
    New Jersey, USA
    Posts
    3,932
    What tools do you have for ETL purpose?. If you don't have it you may have to write script to read line by line and process it. Perl or vbscript or any other scripting tool of your choice can be used as well.

  3. #3
    Join Date
    Aug 2013
    Posts
    2
    Thank you for responding. I appreciate the help. To answer your question, using the C# tool I would say.

  4. #4
    Join Date
    Aug 2013
    Posts
    1
    Assuming your file is called bigfile.txt. Using Linux/UNIX command line:

    cat bigfile.txt | grep "Franklin" | sed 's/|\s*$/||/g' > franklin.txt
    cat bigfile.txt | grep -v "Franklin" | sed 's/|\s*$/||/g' > non-franklin.txt
    wc -l franklin.txt non-franklin.txt

    Caveats:

    1. Your files have header and footer, you may have to generate them for split files too.
    2. Word Franklin may appear anywhere on the line to be filtered out. Adjust the regular expression for grep accordingly if that is not desired.
    3. The first line of data in your example ends with a space but others do not. The extra pipe will be added without the space. If the space was accidental you can remove the \s* portion of the matching expression.
    4. If you don't have Linux, you may get one for free or download Cygwin for Windows (also for free).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •