Hitachi Ltd. has developed a search technology that can find images similar to a specified image from millions of images and video data in one second.

The technology assesses the similarity of images based on image characteristics presented as high-dimensional numeric information. The information is acquired by automatically detecting information regarding the images, such as color distribution and shapes. The technology can be applied to video search as well.

The new technology features (1) high-speed visual similarity search using two-step search clustering technology and (2) faster reading through optimized data allocation on an HDD.

The former feature is a two-step search technology that uses both PC memory and an HDD. When images are registered, the technology stores them while dividing similar images into clusters of similar characteristics. Each cluster is represented by the average value of the image characteristics contained therein, which is also written to the memory.

The technology searches for a similar image through the clusters first, discovering several clusters marked with the average value close to that of the entered image. Then it searches for a similar image amongst the images in the selected clusters.

The technology reduces time spent for a search by limiting search targets before looking to details. In addition, it also reduces memory consumption by writing only the image characteristics that represents the whole cluster.

To further accelerate searching, the speed at reading the image characteristics recorded on the HDD needs to be accelerated as well. The feature (2) above was adopted for this purpose.

By recording the image characteristics by cluster unit, the data in the same cluster are placed in succession on an HDD. The method also optimizes the cluster allocation so that similar clusters do not sit closely to each other. Such processing helps the lens travel smaller distance when searching for a target on an HDD.

As an image search technology, "visual similarity search technology" has already been commercialized. The technology enables to search for targets based on visual information of the image itself, such as color distribution, the sense of touch and composition.

The visual similarity search shows visual data of a search target as hundreds of dimensions of numerical line data (the visual characteristics) and assesses how similar the data (similarity) from a distance between vectors of visual characteristics, or in other words, a gap between the visual characteristics.

Because the visual similarity search requires such calculations of a gap between the visual characteristics that extend to hundreds of dimensions, the volume of computing becomes huge when searching through millions of targets. Hence, the reduction of a search time has been a challenge in the industry.