Automatically compare new PDFs in a folder

Let’s create a Java app which watches a folder for file changes and automatically compares new PDFs.

The idea is that any time two PDFs are added to the folder, we will run a simple comparison of the two PDFs using i-net PDFC, create a report of the comparison and store it in a “reports” folder, and move the source PDFs to a source archive folder.

To start with, we will use Oracle’s own WatchDir example which already handles Java’s Watch Service API to watch a directory for changes. The only modification we will make is to pass in a listener to handle the WatchEvents generated when the directory’s files change:

...
interface WatchEventListener {
   void directoryChanged(WatchEvent<Path> event);
}
private WatchEventListener listener;
...

WatchDir(Path dir, boolean recursive,
         WatchEventListener listener) throws IOException {
   this.listener = listener;
...
}

Additionally, we need to make sure our WatchDir calls this listener when it receives events instead of simply logging the events as the default WatchDir implementation does:

    void processEvents() {
        for (;;) {
            ...
            // print out event
            System.out.format("%s: %s\n", event.kind().name(), child);
            // send event to listeners
            this.listener.directoryChanged( ev );
            ...
        }
    }

Using our WatchDir is now relatively straightforward using the i-net PDFC API. First, we set up our paths based on arguments passed into our tool:

    public static void main(String[] args) throws Exception {
        String sourceFolder = "";
        String archiveFolder = "sourceArchive";
        String reportsFolder = "reports";
        for (int i=0; i<args.length-1; i+=2) {
            if ("-s".equals( args[i] )) {
                sourceFolder = args[i+1];
            } else if ("-a".equals( args[i] )) {
                archiveFolder = args[i+1];
            } else if ("-r".equals( args[i] )) {
                reportsFolder = args[i+1];
            }
        }
        Path sourcePath = Paths.get( sourceFolder );
        Path archivePath = Paths.get( archiveFolder );
        Path reportsPath = Paths.get( reportsFolder );
        if (!Files.exists( sourcePath )) {
            Files.createDirectories( sourcePath );
        }
        if (!Files.exists( archivePath )) {
            Files.createDirectories( archivePath );
        }
        if (!Files.exists( reportsPath )) {
            Files.createDirectories( reportsPath );
        }
        ...

Next, we set up our WatchDir listener to listen for two separate ENTRY_CREATE events of PDF files. Once the second PDF is detected, we launch the PDFC comparison with a DifferencesPDFPresenter which creates our comparison report in the target folder. Finally, we move the source files to the archive folder.

private static Path firstPdfPath;
...

    System.out.println("watching for new files at "+sourcePath.toAbsolutePath().toString());
    WatchDir.WatchEventListener listener = ev -> {
        Path path = ev.context().toAbsolutePath();
        try {
            String fileName = path.getFileName().toString().toLowerCase();
            if (ev.kind() == StandardWatchEventKinds.ENTRY_CREATE &&
                            fileName.endsWith( ".pdf" )) {
                if (firstPdfPath == null) {
                    firstPdfPath = path;
                } else {
                    PDFComparer comparer = new PDFComparer();
                    String datetime = DateTimeFormatter.ofPattern( "yyyy-MM-dd" ).format( LocalDateTime.now() );
                    File subfolder= new File(reportsPath.toFile(), "ComparisonReports-"+datetime);
                    comparer.addPresenter( new DifferencesPDFPresenter( subfolder ) );
                    ResultModel result = comparer.compare( firstPdfPath.toFile(), path.toFile() );
                    System.out.println("Compared "+firstPdfPath.toString()+" to "+path.toString()+".");
                    System.out.println("Differences found: "+result.getDifferencesCount( false ));
                    Files.move( firstPdfPath, archivePath.resolve( firstPdfPath.getFileName() ) );
                    Files.move( path, archivePath.resolve( path.getFileName() ) );
                }
            }
        } catch (IOException ex) {
            ex.printStackTrace();
        }
    };
    WatchDir watcher = new WatchDir( sourcePath, false, listener);
    watcher.processEvents();

Note that to successfully run this sample, you will need to…

  • include PDFC.jar in your classpath
  • have the plugins pdfcparserplugin.zip and reporting.zip in your plugins folder, located in your working directory.
  • have at least a trial license in your i-net PDFC configuration

The entire sample can be found on the PDFC samples GitHub.