I require to develop a script that draws out some information from a complex Excel 2003 file (with numerous sheets and different tables inside a single sheet) and produces various XML files that have to be validated versus a provided XSD file.
My preferred language is Python; to produce and verify XML files i would go with lxml.
Exactly what do you recommend for parsing XLS files?
Is xlrd the best tool to utilize for intricate Excel files?
Or do i require to convert all the sheets in CSV manually, and read files line by line, splitting and getting information?
I accept C#, VB6, VBA ideas too.
XLSX assistance is in alpha test; e-mail me if you need it. The awkwardness and lossiness of the save-as-CSV method was one of the things that triggered me to write xlrd.
I am encouraged the most easy solution for this job is using Excel VBA together with MSXML parser. Look here for some links how to utilize the MSXML parser in VBA for reading XML files, adding or editing VBA code in excel using c#, view this link; you can adopt this easily for writing XML files, I think.
I cant response whether xlrd/python is the right tool for the job – as I don’t know python all right.
There are numerous ways to access the excel data … in the main you have actually VBA built directly in to Excel.
Then you have Ado.net See David Hayden’s post here which allows you to access the data by means of any DotNet language … even IronPython
why do u want to compose functionality that currently exists. suggest excel has it, u can import any websites (just to keep in mind stand out usages IE engine to render tags). here are actions how it can be achieved.
I’m unsure how you want to export html to Excel. In Excel we are speaking about rows and columns while html is a document. You would most likely desire to export html to Word or pdf. the html consists of an outer table and four tables within this outer table. The very first inner table have three columns, the 2nd table have 2 columns the 3 table can have any number of columns which is decided at run time and the Fourth table have three columns.