Bulk insert files into sharepoint with metadata

Hello everyone,

I am creating an application which will migrate  about 16k documents and their metadata from a legacy system to SharePoint.

I got the extract of the legacy system in a way that I have a folder containing the document (pdf file) and an xml file with the metadata per document.

I created the whole thing to be able to do the upload and it works fine, but on the other hand it is damned too slow...

I splited the operations, first I upload all the files and then I update the metadata of the files. I also splited the files in batches of 1000 items.

The file uploading takes more time as the list is filled. At the beginning I needed 15 mins for 1000 files, now that the list has already 3000 files in there, it takes about an hour...

I do check if the file already exists before uploading it because I need to report an error if the file was already in the list (duplicates detection)

Is there anyway to improve the performance of the system?

I also have another issue which is the fact my tool to migrate the files is taking more RAM as the list grows. After the 5000th file my tool is using over 1GB of RAM. Could it be because I use a single SPSite instance for the whole upload? Should I recreate it during the upload?

Here is the code I use in order to upload the files:

using (var web = _currentSite.OpenWeb())
{
   var library = web.GetList(libraryName);
   var relativeFileUrl = string.Format("{0}/{1}", library.RootFolder, fileName);

   if (web.GetFile(relativeFileUrl).Exists) 
      throw new InvalidOperationException(string.Format("The file '{0}' already exists", fileName));

   var file = web.Files.Add(relativeFileUrl, fileStream, false);
}

Thanks a lot for your help!

With kind regards

Carlos

December 20th, 2012 4:33pm

Hi sathyaav,

For this issue, based on the code you provide, you seems initiate a web object for every file you upload, why dont you use one web object, then iterate the file collection to be uploaded, then do the upload operation in the web, it is like:

using(var web = _currentSite.OpenWeb()){

                var library = web.GetList(libraryname);

                foreach(var file in filecollection){

                var relativefileUrl = string.format();

if(web.GetFile(file).Exists){}

var file = web.Files.Add(file, filestream, false);

}

}

Thanks,

Free Windows Admin Tool Kit Click here and download it now
December 25th, 2012 12:33pm

Hi,

Also, make sure you are calling Dispose() method of your FileStream object after uploading each file.

December 25th, 2012 9:02pm

I used a new SPWeb for each upload in order to have a kind of clean context, but I guess I could just use the same SPWeb for all the uploads.

Is that the only performance tweak I could implement?

The Streams are disposed properly.

Thanks in advance!

Free Windows Admin Tool Kit Click here and download it now
December 31st, 2012 3:43am

hello friend

can you try declaring all the variables at the top and reuse them

I think it will solve your problem

regards

manish

December 31st, 2012 7:40am

Hi,

So after a lot of digging the logs and so on, my application was not responsible of the lag on the upload.

Basically the list I was uploading to was using an event receiver which had a memory leak and was not using the most performant methods in order to retrieve data etc...

The lags came from the event receiver. Shame is I was the one who developped the event receiver hahaha :)

Anyway, once the event receiver was fixed, I got much better results for the upload and an upload speed of about 600 files per 10 mins which is totally acceptable for me!

I also learned in the way the two following facts that might interest people in this situation:

Event receiver code is loaded in the context of the console application process doing updates to the list. I thought at first the event receiver would be called in a RPC fashion and would live in the web application process. My mistake :)

My event receiver is responding to the ItemAdded and ItemUpdated events which are assynchronous. Once the console application ends its processing, it shuts down all the threads created by the event receiver even if they did not ended their work!

As there is no way to check if the event receiver ran or not in the code, and after searching on the web for a couple of hours, the only way to solve this is to put a Thread.Sleep at the end of the console application in order to let the event receiver thread pool threads finish their work and avoid killing them.

Normally the wait time should not be too long except if the event receiver is really doing an heavy job or suffers of a huge memory leak (sounds familiar... :p)

Hope this will help others solving their issues.


  • Edited by c.da.silva Wednesday, January 09, 2013 1:39 PM
  • Marked as answer by c.da.silva Wednesday, January 09, 2013 1:40 PM
Free Windows Admin Tool Kit Click here and download it now
January 9th, 2013 4:37pm

Hello ,

We are new to coding and we have a similar case of uploading the PDF and XML meta data . can u share your complete  code and steps how to execute it at our end .  Thanks for your support .

With Kind Regards,

Mohammed .

January 28th, 2014 4:06am

Dear Manish,

We are new to coding and we have to upload the PDF and XML meta data to share point . Can you guide us where to begin and how to do this  . Thanks for your support .

With Kind Regards,

Mohammed .

Free Windows Admin Tool Kit Click here and download it now
January 28th, 2014 4:11am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics