Wednesday, 24 October 2012

read word files using php

In php tutorial we will see how to read word document .doc and .docx into browser. Generally it is not possible to read word file into browser with php. While pdfs can easily be embed into html.

But here we will see two different method which works well to displays characters from word .doc and .docx file.

To create .docx files with php you can use phpdocx. While if you wanna create pdf you can use fpdf and mpdf.

I suggest you to prefer method 1

Method 1: COM object to read MS WORD files. This works well with .docx and .doc


<div style="border:2px solid #1a4572; width:720px;padding:15px">
<?php

$filename = 'msword.docx';
$word = new COM("word.application") or die ("Could not initialise MS Word object.");
$word->Documents->Open(realpath($filename));

// Extract content.
$content = (string) $word->ActiveDocument->Content;

echo nl2br($content);

$word->ActiveDocument->Close(false);

$word->Quit();
$word = null;
unset($word);
?>
</div>


Method 2 : This works well with .doc 

if(file_exists($filename))
{
    if(($fh = fopen($filename, 'r')) !== false ) 
    {
       $headers = fread($fh, 0xA00);

       // 1 = (ord(n)*1) ; Document has from 0 to 255 characters
       $n1 = ( ord($headers[0x21C]) - 1 );

       // 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
       $n2 = ( ( ord($headers[0x21D]) - 8 ) * 256 );

       // 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
       $n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );

       // 1 = (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
       $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

       // Total length of text in the document
       $textLength = ($n1 + $n2 + $n3 + $n4);

       $extracted_plaintext = fread($fh, $textLength);

       // simple print character stream without new lines
       //echo $extracted_plaintext;

       // if you want to see your paragraphs in a new line, do this
       echo nl2br($extracted_plaintext);
       // need more spacing after each paragraph use another nl2br
    }
}