How often do you open a previously used document and do a File, Save As to save a copy of the original document and amend the new copy?
It saves a lot of time and a lot of hassle, but did you know that there could be some private information in the original copy that has now travelled into the saved copy? This is the document properties in Word also known as metadata.
I once paid a lawyer to draft a legal agreement for my business. The document was fabulous and given to me as a Microsoft® Word file so that I could easily add additional information each time I used the document. When I received the file, I noticed that the Word document’s metadata (document properties) held the name of another law firm, not the law firm that this document had been sent from. Not a good look at all.
How did this happen? Simple. My document was a copy of a document from another law firm. When the copy was made, possibly using File, Save As, the Word document’s metadata was carried over into my file.
The lawyer may not have been aware that Word files store metadata, information that can reveal details of the author and organisation from which it originated. Even though my lawyer had indeed drafted the document, the original template of the file had been created at another law firm. So even though it was his work, it looked as though it had originated elsewhere.
Metadata isn’t a word you hear used that often in normal business conversation. In fact, you may be asking “what is it” and “how do I find it”?
What is Metadata in Word (document properties)?
Each Word document has specific details (properties) that are captured. For example, properties can be the document title and the author’s name.
Sounds harmless enough. Don’t be fooled. Metadata in Word is hidden from sight. Unless you know where to locate and edit or remove it you could possibly be endangering the privacy of your clients, team members or even be accused of plagiarism.
This extra information (document metadata) can travel from an original document into a copy. Removing Word metadata from the copy ensures you can feel safe that transferred information doesn’t stay in the new document.
Therefore, if you are going to share an electronic copy of a document with clients or another organisation it is a good idea to review and, if required, remove metadata in Word prior to sharing.
In this blog, we will cover how to easily check and remove metadata from Word documents and remove metadata from a PDF.
Check and remove metadata from Word
Let’s look at an example.
In this example, an email has been sent to Anne containing an attached quote that has already been used for the company ABC Ltd.
Annes been told to open the quote document and do a File, Save As then use it as a template to create a quote for another company. In this example, the new company is XYZ Ltd.
To do this, open the Quote document and select File.
Select Save as.
Change the name of the document to your new document name. In this example, it is XYZ Ltd Quote.
Select Save.
You should now have the same document but saved with a new name.
However, information can travel from the original document and this needs to be removed before sending it to new clients.
How to check the document properties in Word (Metadata)
Metadata is stored in most files. Essentially what you need to be looking for is the document ‘properties’ information for your Word file.
Go to the File tab and come down to Info.
The Properties area contains the metadata and this travels with the Word document.
Even though the Word document has been saved with a new name, the metadata is showing that the Title is still ABC Ltd Quote and the Tags are ABC Ltd.
Metadata removal
It is important to remove the information (metadata) in Word as it wouldn’t look great if another company’s name came through on your quote. Especially if they are competing companies.
To do this, highlight the information you want to change.
Press your Delete key to remove the information.
Metadata removal (How to delete Author information)
The file may also hold onto the author information.
To change this, right-click on the author’s name and then select Remove Person.
Check all document properties in Word
By default, you won’t see all the properties for the file, just a shortened list. Many of the properties are still hidden from view.
In my opinion the best place to check the document properties is by clicking the drop-down arrow next to the Properties list at the top of the pane and then selecting Advanced Properties.
This will then open a dialog box where you can navigate through several tabs to view an even fuller list of properties to make sure there isn’t any unwanted information in your document.
If there is any information you don’t want in the properties just select the information and delete it. Be sure to check every tab and then click OK. If you are unable to remove some data please see the steps for ‘Further check document properties in Word using the Inspect document feature’ below.
Now do a Save to update your document.
Go back to File, Info, Properties, Advanced Properties.
All the information that you deleted should now be gone.
Further check document properties in Word using the Inspect document feature
Many of the file’s properties can easily be edited or removed simply by editing or deleting the content from a property field.
However, if you are unable to remove some of the data manually, you may like to check out the Inspect Document feature.
This feature finds and reveals any hidden data within your file that could be potentially sensitive. This includes document Properties that may have been generated and embedded into the document via a Document Management System, Template or custom macro.
Once Inspect Document has performed the check you can then choose to remove or leave the data.
Note: You might like to create a copy of your file prior to using the Inspector as it isn’t possible to restore the data once the Document Inspector removes it. To remind yourself which version no longer contains sensitive data you might like to add a tag or even detail this in the name of the document, e.g. “filename” – Inspected.docx.
Step 1: Open the file you want to inspect and then from the File tab click the Info tab.
Step 2: Click Check for Issues and then select Inspect Document.
Step 3: The Document Inspector dialog box will be displayed.
Step 4: Select the check boxes for the content you would like inspected.
Note: Please refer to the ‘Document Inspector options’ at the end of this post for a full list of what the Inspector finds and removes.
Step 5: Click Inspect.
Step 6: The results of the inspection will be displayed. Click Remove All next to the data you want removed from the document.
The unwanted data will be removed from your document.
Word Document Inspector Options
The table below displays a full list of what the Inspector finds and removes.
Document Inspector Options |
Action |
---|---|
Comments, Revisions, Versions and Annotations |
Removes Comments, Revision marks from tracked changes,Document version information and Ink annotations. |
Document Properties and Personal Information |
Document properties, including information from the Summary, Statistics, and Custom tabs of the Document Properties dialog box, Content type info, the User (author) name and Template name. |
Task Pane Apps |
If your organisation uses customised Task Pane Apps theInspector will locate and remove them from your document. If Inspector findsa Task Pane App in your document and you are unsure of what it is you shouldspeak with your IT dept. |
Embedded documents |
Identifies if a document has been embedded into your document. |
Macros, Forms and Active X controls |
Identifies where macros, forms and Active X controlswill travel with the document. |
Collapsed Headings |
Identifies where text is hidden by collapsed Headings. |
Custom XML Data |
If your organisation utilises customised XML data this could hold information that will travel with the document. If Inspector finds XMLin your document and you are unsure of what it is you should speak with yourIT dept. |
Headers, Footers and Watermarks |
Information in headers and footers and any Watermarks.In most cases we would not need to remove these. |
Invisible Content |
Objects that are not visible because they have beenformatted as invisible, e.g. via the Selection Pane. |
Hidden Text |
Removes any text that has been formatted as Hidden Textvia the Font dialog box. This doesn’t include any text that has been hiddenby other methods, e.g. white font or behind a picture. |
Remove metadata from PDF
Sometimes when you save the document as a PDF the original name of the document stays within the title of the file.
Let’s look at an example of this in Word.
The image below shows a document saved as ‘MS Office Essential Skills – Stage 1.docx’. This document has been created from a copy of an existing document called ‘Excel Getting Started Step 1.docx’.
When we go to the File tab and select Info.
In the Properties area you can see the Title of the original document saved into the Properties as ‘Excel Getting Started Step 1’.
Even though the document has been saved as a copy and renamed, the original file name is still embedded into the document Properties.
Therefore, when we come to save the file as a PDF, the original file name travels into the PDF properties too.
Let’s walk through this.
To save the Word document as a PDF, from the File tab we can select Save a Copy.
From the Save as drop-down options box, select PDF.
And then click Save.
Now when you open the PDF file you will see that the document has held onto Document Properties Title and its now in the PDF settings.
I’m sure you will agree that this could look bad if your original Title has a customer’s name in it and you were sending the new copy to another customer.
To ensure the original Title doesn’t travel into the PDF, before you save the document as a PDF, in Word, go to File and select Info.
In the Title box, highlight the original title.
Press your Delete key to remove the Title or insert a different title.
Select Save to update the file. Now from the File tab select Save a Copy.
Follow the steps above to save as a PDF. You should now have removed the metadata from the PDF and have the correct new title for your PDF.
Note: if you don’t have the PDF available as a Word document you may need to use a PDF editing application like Adobe Acrobat Pro or Sejda.com to remove or edit the PDF metadata.
Watch the check for hidden data or personal info tutorial
[Watch on YouTube] / [Subscribe to our YouTube Channel]
Was this blog helpful? Let me know in the comments below.
Was this blog helpful? I’m here to empower your journey with Excel, aiming to make your daily tasks more efficient and boost your potential.
Share your thoughts in the Comments below – your insights not only enrich others, they also help me tailor future content to your needs.
And if you’re looking to take a step further, join our exclusive ‘Insider Group‘. As a member, you’ll receive Weekly Super-Tips, and early access to in-depth tutorials. Sign up Today!”
Happy Excel-ling!!
That’s a good and thorough article! If you already have a bunch of PDF’s that were converted from Word with the Title included, you can use bulk metadata scrubbers like BatchPurifier to clean them.