Powershell to find Text in PDF and Word

In continuation to my previous article on PowerShell to find hyperlink texts in ppt, here I am going to show how to find hyperlinks in Word or PDF files. Sample code below, tweak it for your purpose and drop me a comment if you need help.


$FilePath= "C:\Users\xxx\" 
$OurDocuments = Get-ChildItem -Path "$FilePath" -Filter "*.pdf" -Recurse #change to .doc* for word

$Word = New-Object -ComObject word.application
$Word.Visible = $false
$i = 0

$OurDocuments | ForEach-Object {
try {
    $Document = $Word.Documents.Open($_.FullName,$false,$true) 
    #"Processing file: {0}" -f $Document.FullName
    
    try{
    $Document.Hyperlinks | ForEach-Object {
        if ($_.Address -like "https://domain.com*" -or $_.Text -like "https://domain.com*") 
        {
                "Found issues {0} `r`n" -f $Document.Fullname 
                "Found issues {0} `r`n" -f $_.Address
                "Found issues {0} `r`n" -f $_.Hyperlinks
                break
         } 
      

    }
    }catch{Write-Host "Error has occured while accessing" $Document.FullName}
    }
    catch{Write-Error $Document.FullName}

   #"Completed processing {0} `r`n" -f $Document.Fullname
    
  Write-Progress -Activity "Searching Hyperlinks" -Status "Progress:" -PercentComplete ($i/$OurDocuments.count*100)
  $i++
}

$Word.Quit()
[gc]::collect()
[gc]::WaitForPendingFinalizers()

Comments

Popular posts from this blog

Getting Started with MariaDB

Drop Database in Oracle