Hello world

Comparing photos and searching for duplicates is a difficult and demanding task. Thanks to libraries like ImageMagick, the whole task is much simpler. This library allows finding similar photos in directories, extract duplicate images using Classifiers, Normalizers and Comparators.

GitHub Repository

Requirements

  • ImageMagick extension
  • GD extension
  • PHP >= 5.6

Installation

composer require yncki/php-fast-image-compare

Methods

  • areSimilar
  • areDifferent
  • findDuplicates
  • findUniques
  • clearCache
  • setTemporaryDirectory
  • setTemporaryDirectoryPermissions
  • registerClassifier
  • registerComparator
  • setChunkSize
  • setCacheAdapter

Usage

Enough percentage (difference percentage)

During comparison of images an enough percentage is required.
This is a float value between 0.0 and 1.0 where 0.0 = 0%, 0.01 = 1% etc.
Recommended value is between 0.05 - 0.15 for MAE metric with 8px normalized sample. Default is 0.05 (5%).
Note that for each used metric the comparing method will report different values. For bigger normalized sample you should increase this value.

Basic


To find similar images in array which difference is not greater than 10%

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\FastImageCompare;

$instance = new FastImageCompare();
$images = array(
    'path/to/file/1',
    'path/to/file/2',
    'path/to/file/3'
);
$similarArray = $instance->findSimilar($images,0.10);

Preffered picks

Preffered picks is a mechanism for selecting image from duplicates based on specified attribute.

Currently supported preffered pick modes are:

  • PREFER_ANY
  • PREFER_LARGER_IMAGE - more pixels | *default
  • PREFER_SMALLER_IMAGE - less pixels
  • PREFER_LARGER_DIFFERENCE - larger difference
  • PREFER_LOWER_DIFFERENCE - smaller difference
  • PREFER_COLOR - color first
  • PREFER_GRAYSCALE - grayscale first

For now only findUniques method supports preffered pick.

For example take a look at specified task:
Find unique images and when a duplicate is found - give me larger image.

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\FastImageCompare;

$instance = new FastImageCompare();
$images = array(
    'path/to/file/imageA_original_1000x500.png',
    'path/to/file/imageA_resized_200x125.png',
    'path/to/file/imageC.png'
);
$similarArray = $instance->findUniques($images,0.10,FastImageCompare::PREFER_LARGER_IMAGE);

// returns $similarArray  = [
//    'path/to/file/imageA_original_1000x500.png', <-- library excluded imageA_resized_200x125.png because it was smaller than original
//    'path/to/file/imageC.png'
// ]

Classifiers

Before comparing first thing to do is to classify images into groups. You can register many classifiers using method registerClassifier(). Each classifier takes a filepath as input and returns tags which describe image.
Images which are in different groups ( have different tags ) are interpreted as different. By default no clasifiers are registered.

Available classifiers

  • ClassifierColor - optimized classifier for counting colors and determining image is grayscale or not
  • ClassifierFileExtension

Normalizers

It is a good idea to normalize images before comparing. Normalizers are registered per comparator using method registerNormalizer(). Many normalizers can be registered for each comparator. You can write your own normalizers by extending class NormalizableBase

Available normalizers:

  • NormalizerSquaredSize - this normalizer resizes images to common width & height, by default to 8x8 px and to png format.
  • NormalizerGrayScale - this normalizer changes colorspace of image to be grayscale.

Comparators

Comparators are used to compare two images, can implement additional settings. By default only one Comparator ImageMagick is registered with default settings. You can write your own comparators by extending class ComparableBase.
Each comparator can work in two modes: STRICT and PASSTHROUGH.
Comparator can have many normalizers.

Available comparators:

  • ComparatorFileCrc32b
  • ComparatorImageMagick

To manually register comparators pass null as second argument to constructor or use method setComparators()

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\FastImageCompare;
use pepeEpe\FastImageCompare\ComparatorImageMagick;
use pepeEpe\FastImageCompare\NormalizerSquaredSize;
use pepeEpe\FastImageCompare\ComparatorFileCrc32b;
use pepeEpe\FastImageCompare\IComparable;

$instance = new FastImageCompare('/my/tmp/dir/',null);

//register crc32b comparator with mode
//STRICT - when comparing images will have identical hash next comparators wont launch
$instance->registerComparator(new ComparatorFileCrc32b(),IComparable::STRICT);

//create second Comparator ( ComparatorImageMagick ) with metric NCC and no normalizers
$imageMagickComparator = new ComparatorImageMagick(ComparatorImageMagick::METRIC_NCC,[]);

//register size normalizer for imageMagick with sample size 16px x 16px
$imageMagickComparator->registerNormalizer(new NormalizerSquaredSize(16));

//register second comparator as PASSTHROUGH
$instance->registerComparator($imageMagickComparator,IComparable::PASSTHROUGH);

//ready for use

ComparatorImageMagick settings

This comparator supports various settings but the most important is compare metric. Refer to ImageMagick documentation for more info.

  • Metrics - applicable by setMetric() or in constructor()
    • ComparatorImageMagick::METRIC_AE - Absolute Error count of the number of different pixels
    • ComparatorImageMagick::METRIC_MAE - Mean absolute error (average channel error distance))
    • ComparatorImageMagick::METRIC_NCC - Normalized cross correlation
    • ComparatorImageMagick::METRIC_MSE - Mean squared error (averaged squared error distance)
    • ComparatorImageMagick::METRIC_RMSE - (sq)root mean squared error, ie: sqrt(MSE)
  • setIgnoreAlpha($boolean) - when comparing transparent images set true to ignore alpha channel

Metrics

All resulting metrics have been implemented to return normalized 0.0 - 1.0 difference value.
Metrics availability depends on imageMagick version installed on target machine

Caching and Temporary directory

Because comparing images is very cpu expensive job you should enable caching. There are two levels of cache. One is temporary directory which caches normalizers output, second is available cache interface for comparators results.
Temporary directory is required, if none specified system temp directory is used.
For comparators cache you can use any PSR-6 compatible caching adapter.

To register cache adapter - FilesystemAdapter from Symfony in the same directory as temporary directory.
$instance->setCacheAdapter(new FilesystemAdapter('', 3600, $instance->getTemporaryDirectory()));
To clear all cache
$instance->clearCache();
To clear cache files older than 24h pass seconds as first parameter
$instance->clearCache(60 * 60 * 24);

Examples

Basic Example 1 - Comparing two files

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\FastImageCompare;

$instance = new FastImageCompare();
$bool = $instance->areSimilar('path/to/first/image','path/to/second/image',0.05);//allow max 5% difference

Basic Example 2 - Duplicates and Uniques in directory

Move slider to change enough percentage. Using default Metric MAE ( Mean Absolute Error ) more on available metrics.

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\Utils;
use pepeEpe\FastImageCompare\FastImageCompare;

$instance = new FastImageCompare();
$percentage = 0.10; // 100% = 1.0 , 10% = 0.1 , 1% = 0.01

$input = Utils::getFilesIn('/absolute/path/to/directory/with/images/');
//find duplicates/similar
$duplicates = $instance->findDuplicates($input,$percentage);
//find unique images
$uniques = $instance->findUniques($input,$percentage);

Example 3 - Multiple normalizers, 32px sample size + grayscale and RMSE metric

Move slider to change enough percentage. Using Metric RMSE more on available metrics. and two normalizers

Code
require_once 'vendor/autoload.php';
use pepeEpe\FastImageCompare\FastImageCompare;
use pepeEpe\FastImageCompare\ComparatorImageMagick;
use pepeEpe\FastImageCompare\NormalizerGrayScale;

$instance = new FastImageCompare('tmpdir',null);
$percentage = 0.10; // 100% = 1.0 , 10% = 0.1 , 1% = 0.01

$input = Utils::getFilesIn('/absolute/path/to/directory/with/images/');

//create comparator without normalizers
$comparatorImagick = new ComparatorImageMagick(ComparatorImageMagick::METRIC_RMSE,[]);

//register normalizer squared size with 32px
$comparatorImagick->registerNormalizer(new NormalizerSquaredSize(32));

//register normalizer grayscale
$comparatorImagick->registerNormalizer(new NormalizerGrayScale());

//register comparator with attached normalizers
$instance->registerComparator($comparatorImagick);

//find duplicates/similar
$duplicates = $instance->findDuplicates($input,$percentage);

//find unique images
$uniques = $instance->findUniques($input,$percentage);

Discuss