Python ProgrammingPython Programming

Read data from word file

Sometimes data will be stored as Docx files, hence first we need to extract text data from Docx file and then use it for further analysis.

docx is required library for this recipe. Installing docx on your computer is a very simple. You simply need to install it using pip.

pip install python-docx

DOCX Data Collection for Analysis

import docx


def main():
    try:
        doc = docx.Document('test.docx')  # Creating word reader object.
        data = ""
        fullText = []
        for para in doc.paragraphs:
            fullText.append(para.text)
            data = '\n'.join(fullText)

        print(data)

    except IOError:
        print('There was an error opening the file!')
        return


if __name__ == '__main__':
    main()