Quantcast
Channel: VBForums - ASP, VB Script
Viewing all articles
Browse latest Browse all 688

Best way to read and split the large file to multiple small files

$
0
0
Hi All,

I have XML file with >100MB (more than 20L lines) and I cannot pass it directly to my one of the process. I need to split in to multiple smaller files by assuming some separator and I tried using FileSystemObject in VBScript as well as BATCH file option. Both takes more than 8 mins to read and create first small file with 10000 lines.

Please convey some good performance oriented options for this task.

Appreciate your help.

Method1:
Code:

Function SplitXML()
    Dim oSrcFile, oTgtFile, strHeader, intFiles, strContent, intSize
    Dim arrLines()
    Set oFSO = CreateObject("Scripting.FileSystemObject")
 
    strFilePath = Application.GetOpenFilename
    intLinesToSplit = InputBox("Enter the No of Lines to split with for each file:", "XML Splitter", 10000)
    strTgtPath = Replace(strFilePath, oFSO.GetFileName(strFilePath), "")
    strFileName = Replace(oFSO.GetFileName(strFilePath), ".xml", "")
    Set oSrcFile = oFSO.OpenTextFile(strFilePath)
 
   
    'Headers
    strHeader = "": strContent = ""
    Do
        strTemp = oSrcFile.ReadLine
      strHeader = strHeader & vbNewLine & strTemp
    Loop While InStr(1, strTemp, "</BLHeader>", 1) <= 0
 
    'Split
    intTemp = 0: intFiles = 0: blnNewFile = True: intSize = -1
    Do While Not oSrcFile.AtEndOfStream
        intTemp = intTemp + 1
     
        'Content
        'intSize = intSize + 1
        'ReDim Preserve arrLines(intSize)
        'arrLines(intSize) = oSrcFile.ReadLine
        strTemp = oSrcFile.ReadLine
        strContent = strContent & vbNewLine & strTemp
     
        If intTemp >= intLinesToSplit Then
            If InStr(1, strTemp, "</EndingTag>", 1) > 0 Then
                'Add Header
                intFiles = intFiles + 1
                Set oTgtFile = oFSO.CreateTextFile(strTgtPath & strFileName & "_" & intFiles & ".xml", True)
                oTgtFile.WriteLine strHeader
         
                'Add Content
                oTgtFile.WriteLine strContent 'Join(arrLines, vbNewLine)
             
                'Add tail
                oTgtFile.WriteLine "</FinalFileTag>"
                oTgtFile.Close
            End If
        End If
    Loop
 
    oSrcFile.Close
End Function

Method2:
Code:

@echo off
setlocal EnableDelayedExpansion

set InFile=c:\ee\EE28352646\in.txt
set OutDir=c:\ee\EE28352646
REM Can not be larger than 2147483648 !!!
set MaxLines=1000000

if not exist "%InFile%" (
  echo *ERROR* Input file does not exist!
  exit /b
)

if not exist "%OutDir%\" (
  echo *ERROR* Output folder does not exist!
  exit /b
)

for %%A in ("%InFile%") do (
  set Name=%%~nA
  set Ext=%%~xA
)

set /a Line=MaxLines+1
set File=0
for /f "usebackq tokens=*" %%A in ("%InFile%") do (
  set /a Line+=1
  if !Line! GTR %MaxLines% (
    set /a File+=1
    set OutFile=%OutDir%\%Name%_!File!%Ext%
    if exist "!OutFile!" del "!OutFile!"
    set Line=1
  )
  echo.%%A>>"!OutFile!"
)


Viewing all articles
Browse latest Browse all 688

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>