Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.
Jump in the discussion.
No email address required.
Notes -
Oh, good chance to ask, how good is acrobat OCR? I've been using the one built into Google drive, but it's not possible to batch it.
It's pretty good but it's time consuming for larger files. To provide some context, I was doing legal work for oil and gas and I had to determine if certain assignments pertained to certain leases (an assignment is when one company conveys lease rights to another; I'll include things like mortgages and financing statements in this category). They often do this in large documents conveying several thousands of interests at one time. It can be incredibly time consuming to do this by simply reading the document, especially since most of them are ordered by some kind of internal lease number rather than alphabetically or geographically or by some other parameter that I have access to. It gets even worse when they're conveying different interests for different leases and there are several exhibits to go through. After OCR I'd usually search by lessor name first. If I found what I was looking for, great, if not, I'd try parcel number, and if that failed, I'd search by the recording information for the original lease. These latter two parameters were kind of dicey because the information is often laid out in a table and the OCR occasionally has trouble determining where the line breaks are. With a name you at least have the security of knowing that the first few letters will be consecutive without a line break. If I got to this point and didn't find anything then I figured I could safely assume that the document didn't apply to the lease I was concerned about, unless, of course, there was some kind of blanket language, but that's usually easy to find. It wasn't 100% accurate, though, because there were some cases where I knew that what I was looking for was in there but it wasn't coming up because of a typo, or bad scanning, too-small printing, etc. at which point I'd have to search the whole document manually. My superiors didn't like relying on OCR because of this, but in my experience mindlessly scanning page after page was more likely to lead to an error than the OCR was. The advice I'd give to the client relied pretty heavily on the applicability of certain of these documents, so I'd say that it's probably good enough for whatever you plan on using it for, assuming that it isn't an application that could get you fired or cause some other kind of serious problem.
I never had to batch scan so I can't comment on how well this works. One final caution I'd give is that OCR info causes the file sizes to balloon considerably. The firm I worked at required us to eliminate all exhibit pages from these documents except the ones that were directly applicable to prevent the already-large size of the client's product to balloon to unmanageable levels and take up too much room on our cloud storage. This was followed by a prohibition on including OCR'd stuff in our final client PDFs for the same reason, as we saved copies of all our work and it was taking up entirely too much space. It wasn't uncommon for one of these large documents to take up in excess of 300 megs due to all the additional OCR data. So if you plan on saving all of these PDFs locally, it's something to be aware of.
Wow, thanks for the review. If you trusted it with that, it must be more than good enough for the stuff I was doing (casually browsing through old French books)
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link