Powershell to find Text in PDF and Word

In continuation to my previous article on PowerShell to find hyperlink texts in ppt, here I am going to show how to find hyperlinks in Word or PDF files. Sample code below, tweak it for your purpose and drop me a comment if you need help.


$FilePath= "C:\Users\xxx\" 
$OurDocuments = Get-ChildItem -Path "$FilePath" -Filter "*.pdf" -Recurse #change to .doc* for word

$Word = New-Object -ComObject word.application
$Word.Visible = $false
$i = 0

$OurDocuments | ForEach-Object {
try {
    $Document = $Word.Documents.Open($_.FullName,$false,$true) 
    #"Processing file: {0}" -f $Document.FullName
    
    try{
    $Document.Hyperlinks | ForEach-Object {
        if ($_.Address -like "https://domain.com*" -or $_.Text -like "https://domain.com*") 
        {
                "Found issues {0} `r`n" -f $Document.Fullname 
                "Found issues {0} `r`n" -f $_.Address
                "Found issues {0} `r`n" -f $_.Hyperlinks
                break
         } 
      

    }
    }catch{Write-Host "Error has occured while accessing" $Document.FullName}
    }
    catch{Write-Error $Document.FullName}

   #"Completed processing {0} `r`n" -f $Document.Fullname
    
  Write-Progress -Activity "Searching Hyperlinks" -Status "Progress:" -PercentComplete ($i/$OurDocuments.count*100)
  $i++
}

$Word.Quit()
[gc]::collect()
[gc]::WaitForPendingFinalizers()

1 comment:

sindhu said...

Hi,

Powershell to find a text in word works for me but pdf doesn't work could you please help its really urgent? i need ti get the hyperlinks from the pdf and replace it.

Running Total In SQL Server

Running total refers to the sum of values in all cells of a column that precedes the next cell in that particular column. Using Window Funct...