
If your boss doesn't want to foot the bill for a 3rd party converter, you can attempt to handle the conversion on your own with the Office.Interop namespace. Also, be aware that images in Word documents can be problematic when you've converted to HTML (they aren't preserved in the generated file, which means more /sarcasm/ fun for you on the web dev side). suggested using a 3rd party tool, which I completely agree with as being the best way to handle the conversion (if you don't require your users to save their docs to HTML). Unfortunately, you're in for a bit of tomfoolery no matter which workflow you choose. Doc file to database > web app pulls the doc and converts it on the fly when its requested by a web page Doc > user uploads doc thru app you create > the app converts the doc on the fly and then inserts HTML into database > web app pulls the HTML from the database to display on the web page HTML > user uploads doc to database thru app you create > web app pulls the HTML from the database to display on web page You have a couple of workflows to choose from, but they go something like this: If your boss is dead-set on displaying it in HTML, then getting the HTML generated by the word doc into your database is the hardest part of the project. You might also want to take a look to the Word automation services must do it correctly, or entities do not serialize properly.įile.WriteAllText("Test.html", html.ToStringNewLineOnAttributes())

If you further transform the XML tree returned by ConvertToHtmlTransform, you PtOpenXmlUtil.cs defines the XEntity class. Note: the XHTML returned by ConvertToHtmlTransform contains objects of type XElement html = HtmlConverter.ConvertToHtml(doc, settings) HtmlConverterSettings settings = new HtmlConverterSettings() Using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true)) MemoryStream.Write(byteArray, 0, byteArray.Length) Using (MemoryStream memoryStream = new MemoryStream()) No images are converted.īyte byteArray = File.ReadAllBytes("Test.docx")

If you are using DOCX you can allways use Open XML SDK from Microsoft, it's pretty easy to use and clean.Ī sample taken from MSDN // This example shows the simplest conversion.
