Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jdavid_rp

macrumors regular
Feb 25, 2020
237
766
Not true. The detection of offending photos still needed iCloud servers. There was no mechanisms for checking photos you didn't upload to iCloud and the phone couldn't complete the check on its own. Checks against a database of known illegal images couldn't possibly have any utility with photos you have just taken yourself, so why would they do that? The feature was mentioned in the iCloud T&Cs as 'pre-screening'.

Also, less desirable governments have a whole stack of features on iPhones they could exploit far more readily that this one. Photos, for example, already scans your images to understand the content and parse readable text, recognise people, and of course it tracks the date, time and GPS coordinates of each photo you take. All that is far more invasive that anything the CSAM detection could do, but nobody cares about that because it's not as overtly sinister as the topic of CSAM.
Nobody cares because in those features no one except Apple is involved. A government can’t come to Apple and say “please detect this flag on the users pics and send the pic and user info to us”. But in this case they could say “We found new illegal content and we made new hashes, add them to the database” and Apple would have no idea.

And if they could actually do the first thing I said, they wouldn’t need this CSAM thing Anyways
 

mw360

macrumors 68020
Aug 15, 2010
2,045
2,423
Nobody cares because in those features no one except Apple is involved. A government can’t come to Apple and say “please detect this flag on the users pics and send the pic and user info to us”. But in this case they could say “We found new illegal content and we made new hashes, add them to the database” and Apple would have no idea.

And if they could actually do the first thing I said, they wouldn’t need this CSAM thing Anyways

Of course Apple would have an idea, because their human review of matches would keep producing non-CSAM image matches, and they'd know right away that the DB had been compromised and would not forward those results to law enforcement.
 

jdavid_rp

macrumors regular
Feb 25, 2020
237
766
Of course Apple would have an idea, because their human review of matches would keep producing non-CSAM image matches, and they'd know right away that the DB had been compromised and would not forward those results to law enforcement.
That team, afaik, would compare a cropped image of both the user pic and the csam database pic (both to avoid showing the employee the full illegal pic for mental reasons and avoid showing the full user pic to try to keep some privacy in case it’s a false positive). If the flag pic is in the database, the employee will mark it as a real match, because even if it’s not showing any child whatsoever, the cropped user image matches the database image.
 

mw360

macrumors 68020
Aug 15, 2010
2,045
2,423
That team, afaik, would compare a cropped image of both the user pic and the csam database pic (both to avoid showing the employee the full illegal pic for mental reasons and avoid showing the full user pic to try to keep some privacy in case it’s a false positive). If the flag pic is in the database, the employee will mark it as a real match, because even if it’s not showing any child whatsoever, the cropped user image matches the database image.

It's true that the employee could never see either original image, but the degradation of the thumbnails can only ever be algorithmic, not human guided to crop out certain things for decency. It happens on the phone and on the phone the system made no attempt to understand the content of the photo, so it couldn't do anything like that. Some would probably omit anything incriminating, but remember the review would be looking at upwards of 30 images in a collection. I recall Apple saying this human step was necessary to check the collection was actually CSAM images (as best as the law would allow them too). What otherwise would be the point of the review?
 

Grey Area

macrumors 6502
Jan 14, 2008
423
1,004
The training here matters, not just thresholds. If the network is trained to find contextually similar images, or not trained against it, it may cluster "things in fields" together. If the network is trained to find specific images and trained away from finding similar but distinct images, as Apple does here, it will better discriminate between them-- generate hashes a greater distance from the target hash. If non-target images of the types people are concerned about (generally just any image involving nudity) are in the training set of distraction images, it will train the network to differentiate on features other than simple nudity.

The intention here is to have the hash be invariant to perceptually similar transformations (crops, color shifts, noise, etc) while highly sensitive to differences in content.

I think people are generalizing a few things neural nets can do into assuming NeuralHash is using them for those things. It's hard to know what the collisions just from what we know, but most of the examples I've seen published of hash collisions against NeuralHash are very distinct images.
Yeah, but I think that is because people specifically create the colliding images using gradient attacks, and having wildly different pictures with the same hash is very illustrative and attention grabbing. Visually similar images of different things are in my opinion the greater risk in the real world.

When we tested perceptual hashing methods, we ran them on millions of news media images and manually looked at clusters. It is very easy for a human to look at a page with 50 images and tell if they are all about the same or some are different, and indeed, often there would be one or two that should not be there. They would still be similar, as in my cow/tractor example where both images are mostly green (field) and blue (sky).

A problem here is that whether a difference is significant can be very subjective or context dependent. For an extreme example, three pixels difference between otherwise identical pictures may be irrelevant noise for most picture pairs, or all there is between "all quiet" and "incoming drone, fire!" when this is what an automated air defence cannon is looking at.

The reverse engineered version of NeuralHash indicates it was nothing ground breaking as far as perceptual hashing goes. Maybe Apple's final version would have set new standards, but I think it is more likely it would have the same issues as other perceptual hashes.

Edit: removed a misplaced quote box
 
Last edited:

StuBeck

macrumors 6502a
May 6, 2008
791
1,180
Apple has no more ability to unlock your phone than anyone else.
Apple specifically stated they would not give the unlock code in this case. If they truly didn't have the ability to unlock a phone, they would have stated that instead.
 

theluggage

macrumors 604
Jul 29, 2011
7,554
7,477
Which misreads what I was saying and goes out of its way to ignore all the words around it. Also, I refer you back to this at the beginning:
if you read this and walk away thinking every image has a 50% chance of a match, you're understanding it wrong.

You're still missing the point. It doesn't matter whether the probability is 0.5 or 1.0E-20, you simply can't say:

probability of 1 event = p​
probability of 2 events = p x p​

without adding:
assuming that the events are uncorrelated/independent

Without that assumption (which you will see stated in any reputable explanation of conditional probability) your mathematics is wrong. You really can't use coin tosses/dice throws/radioactive decay/whatever - which are all known good approximations to independent events - as a model/analogy/example of false matches unless you know that those false matches are independent. The whole Sally Clarke debacle was because someone applied the "independent events" model to a poorly understood situation where the events turned out not to be independent.
 
Last edited:

theluggage

macrumors 604
Jul 29, 2011
7,554
7,477
Probability any single test is wrong (false positive, since we've assume everyone's honest) = 0.5^30
Probability any single test is correct = 1 – 0.5^30
Probability that every one of the 10^6 tests are correct = (1 – 0.5^30)^(10^6)
I agree with your general point that very small probabilities add up when applied to a large population, but it can't be stressed too much that the math above is just plain wrong unless the probabilities refer to independent events.

Coin tosses, die throws, crypto-grade random number generators all produce independent events: I.e. even if you have just (by some molecular chance) thrown 10 double sixes in a row, the chances of throwing an 11th double-six are still only 1/36.

False matches from perceptual image hashing against any individual's photo collection would likely not be independent - if (say) the pattern on your wallpaper generates a false match then the chances of you having a second picture including that wallpaper are just about 1.

If Apple's NeuralHash (unlike other perceptual hashing systems) isn't prone to that sort of error then it's for Apple to prove. What they can't do is make the possibility go away by saying "oh, but we'll look for 30 matches before acting".
 

theluggage

macrumors 604
Jul 29, 2011
7,554
7,477
Of course Apple would have an idea, because their human review of matches would keep producing non-CSAM image matches, and they'd know right away that the DB had been compromised and would not forward those results to law enforcement.
But Apple don't have your original image - its either encrypted (or still only on your device if it has been blocked from uploading) - so how can they do that human review in any meaningful way? As I understood it, they didn't have access to the "known CSAM" images (which would be illegal to posess) - just the hashes provided by a NGO. All they could do would be to confirm that the hashes were indeed equal (which they would need to do, because if you're doing billions of high-stakes computations a day, even random cosmic-ray bit flips are a significant possibility) which wouldn't guard against systematic false matches or innocent images in the CSAM database.

...and the trouble is, if that's not the way the "human review" works then it starts to contradict Apple's claims about privacy, end-to-end encryption, not having a backdoor into your data and not "searching your iPhone".
 

theluggage

macrumors 604
Jul 29, 2011
7,554
7,477
Nobody cares because in those features no one except Apple is involved. A government can’t come to Apple and say “please detect this flag on the users pics and send the pic and user info to us”. But in this case they could say “We found new illegal content and we made new hashes, add them to the database” and Apple would have no idea.
..and there's a broader concern that isn't directly connected to the technical details.

The UK is currently working on legislation to effectively require a backdoor into any "end to end" encryption to stop Bad People of various descriptions. I'm sure the US is looking on with interest. Apple are - to their credit - resisting this, which is probably one reason why they needed to backtrack on CSAM detection.

Simply by implementing this system, Apple would be setting a precedent by accepting responsibility for checking users' files for "contraband" before they were uploaded to a personal cloud storage service that offered "end-to-end-encryption" - not to mention providing a nice proof-of-concept of the required technology. Regardless of the details, the practical upshot is still a partial backdoor into end-to-end encryption (whichever way you cut it, Apple get to "know things" about the content of the encrypted file) and get that accepted by the public and industry. If this scheme were successful it would make it much, much easier for subsequent governments to ask "Hey, if you can detect evil kiddie porn and still claim to offer end-to-end encryption, why can't you detect (anything else we say is just as evil)?" and there's really no good answer to that.

Ultimately, the government can always pass legislation and say "do what we say or take your business elsewhere" - but that's much harder to fight once you've established that what they're asking could be done by just tweaking a few parameters in what you already do - especially if you've been evangelising how wonderful that technology is and arguing as to why it doesn't affect people's privacy or security.
 
  • Like
Reactions: Grey Area

mw360

macrumors 68020
Aug 15, 2010
2,045
2,423
But Apple don't have your original image - its either encrypted (or still only on your device if it has been blocked from uploading) - so how can they do that human review in any meaningful way? As I understood it, they didn't have access to the "known CSAM" images (which would be illegal to posess) - just the hashes provided by a NGO. All they could do would be to confirm that the hashes were indeed equal (which they would need to do, because if you're doing billions of high-stakes computations a day, even random cosmic-ray bit flips are a significant possibility) which wouldn't guard against systematic false matches or innocent images in the CSAM database.

...and the trouble is, if that's not the way the "human review" works then it starts to contradict Apple's claims about privacy, end-to-end encryption, not having a backdoor into your data and not "searching your iPhone".

For each uploaded image Apple would get an encrypted package which contained the neural hash of the image and a 'derivative' of the image (widely thought to be a low resolution thumbnail). The package was encrypted in a fashion which meant (briefly) that it could not be decrypted unless the image it refers to a matched image in the CSAM DB. It's the ability to decrypt the package on the iCloud servers which is the test of a CSAM image match - there was no equality test of hashes on the phone. Also the package could only ever be unlocked as part of a set of 30-or-so, never in isolation. Once those packages unlocked, the human reviewer would see the set of 30 thumbnails and decide whether it looked like a concerning collection of pictures, or the product of hash collisions (which theoretically would be a random set of pictures).
 

theluggage

macrumors 604
Jul 29, 2011
7,554
7,477
For each uploaded image Apple would get an encrypted package which contained the neural hash of the image and a 'derivative' of the image (widely thought to be a low resolution thumbnail). The package was encrypted in a fashion which meant (briefly) that it could not be decrypted unless the image it refers to a matched image in the CSAM DB. It's the ability to decrypt the package on the iCloud servers which is the test of a CSAM image match - there was no equality test of hashes on the phone. Also the package could only ever be unlocked as part of a set of 30-or-so, never in isolation. Once those packages unlocked, the human reviewer would see the set of 30 thumbnails and decide whether it looked like a concerning collection of pictures, or the product of hash collisions (which theoretically would be a random set of pictures).
Thanks. That's actually helpful and informative (and I haven't seen it explained before).
 

TheOldChevy

macrumors 6502
May 12, 2020
442
797
Switzerland
Let's assume everyone's honest, and that the probability any single test is wrong is 0.5^30, and we want to determine the chance we'll get one or more false positives in a city of 1M. Then I believe this is correct:

Probability any single test is wrong (false positive, since we've assume everyone's honest) = 0.5^30
Probability any single test is correct = 1 – 0.5^30
Probability that every one of the 10^6 tests are correct = (1 – 0.5^30)^(10^6)
Probability that the above is not the case, i.e., that one or more tests is wrong = 1 – (1 – 0.5^30)^(10^6) ≈ 0.001

Thus the probability that one or more honest citizens get a false positive is actually ≈ 0.1%. With a population of 100M, the probability increases to ≈ 10%.

Probability of false positive is not the main issue in my opinion.

The main issue is that once the system is set up, the company (Apple or another) and then any government, may ask the company for looking for other illegal images. And "illegal images" can vary from what most of us here consider illegal, to things that only some governments find illegal. And of course it can get hash for full images, or hashes for image elements, like text on the image or flags... basically by enabling this feature for a good purpose, you enable a process that can be used for other purposes. That's why I am glad that it was not enabled. But just knowing that it is feasible does not make me confortable for our future.
 

hagar

macrumors 68000
Jan 19, 2008
1,999
5,042
For each uploaded image Apple would get an encrypted package which contained the neural hash of the image and a 'derivative' of the image (widely thought to be a low resolution thumbnail). The package was encrypted in a fashion which meant (briefly) that it could not be decrypted unless the image it refers to a matched image in the CSAM DB. It's the ability to decrypt the package on the iCloud servers which is the test of a CSAM image match - there was no equality test of hashes on the phone. Also the package could only ever be unlocked as part of a set of 30-or-so, never in isolation. Once those packages unlocked, the human reviewer would see the set of 30 thumbnails and decide whether it looked like a concerning collection of pictures, or the product of hash collisions (which theoretically would be a random set of pictures).
Nice explanation. But totally wrong. Instead of scanning images in the cloud, the system would have performed on-device matching using a database of known CSAM image hashes provided by the NCMEC and other child safety organizations. Apple would have transformed this database into an unreadable set of hashes that is securely stored on users’ devices.
 
  • Like
Reactions: VulchR

VulchR

macrumors 68040
Jun 8, 2009
3,406
14,294
Scotland
Similar images do not create the same hash.
There were published examples of how this could happen, for the matches to the hash were not required to be exact in Apple's scheme. Also, the hashes were not pixel-by-pixel templates (otherwise they would take up a huge amount of space, and any total representation of an entire picture would be as illegal as the picture). The hashes entailed data reduction techniques and so they would always be approximations of the targeted files. That also means the a given hash was not unique to one picture. This is why Apple's proposal required human reviewers before passing the information on to law enforcement.
 
  • Like
Reactions: hagar

mw360

macrumors 68020
Aug 15, 2010
2,045
2,423
Nice explanation. But totally wrong. Instead of scanning images in the cloud, the system would have performed on-device matching using a database of known CSAM image hashes provided by the NCMEC and other child safety organizations. Apple would have transformed this database into an unreadable set of hashes that is securely stored on users’ devices.

It isn't totally wrong my friend, though you might say it comes down to semantics of 'scanning' in the end.

The phone software encrypted the image package using a specially crafted cryptographic combination of both the image's hash and the encrypted hashes from the DB. It was never able to decrypt the NCMEC hash DB locally, nor encrypt local hashes using the DB's public key, nor by any other means do a test if one thing equaled another. The phone performed the encryption process on every image, and uploaded every output without knowing what it meant.

Only when attempting to decrypt those packages could any result be learned, and they could only be decrypted by Apple on iCloud because only Apple had the necessary keys to the DB part of the system. Again, only matching packages could be decrypted. The act of successful decrypting was the actual positive match result.

Even if you don't believe me, it doesn't make sense to test on the device. If the phone performed the hash check, the phone would know the result, and if the phone could know it, so could a security researcher, and within days or hours deliberately crafted colliding images would surface to cause havoc with the system. And on the opposite end, the CSAM users could also easily test tools to thwart detection.
 
Last edited:

Wanted797

macrumors 68000
Oct 28, 2011
1,725
3,622
Australia
It only would run if you uploaded your photos to iCloud, so presumably you'd have to agree to the iCloud Ts & Cs...



I never understood this argument. What's to stop bad governments from wanting to do this anyway? If they want to insist that Apple scan everyone's photos they still can demand that and Apple will likely deny the request. If they implement this hash scanning, they can demand Apple expand it, and Apple will likely deny the request.
The ability to do it on device that is the slippery slope part.

Even if governments can request either way. Requesting expansion can be easier than requesting Apple the build something completely new.
 

Motorola68000

macrumors 6502
Sep 12, 2022
301
273
Great that many of us who warned of the dangers of client side scanning have been shown to have been correct. So many posters tried to make so many excuses, but Apple knew. However they've only come out now with information they should have released in the first place.

Governments love the idea of client side scanning, and Apple and others should do everything within their power not to ever use client side scanning, which is an abuse of privacy, an abuse of private equipment and was always the very very slipper slope to wholesale surveillance of the most pernicious sort.
 
  • Like
Reactions: VulchR

Motorola68000

macrumors 6502
Sep 12, 2022
301
273
Apple just need to pray that criminals that are involved with child abuse and exploitation do not use icloud for their ill gotten gains because if the police catch the criminals and find images on Apple's icloud, the crap will hit the fan because Apple are saying their current systems for finding child abuse media on icloud are already robust and thus they do not need to build CSAM detection into icloud but if the police were to find images it would prove Apple's stance on the issue is baseless and cause a huge backlash against Apple because many would then be saying if Apple had implemented CSAM like they was asked to, the images the police find would not have been there.
I and others were never against Apple or anyone else using THEIR servers for scanning material. The problem was when they wanted to bug every piece of kit for client side scanning, that could and would be abused by authorities the world over.

When you sell kit to customers, to then seek to put surveillance software scanning client side is an abuse of privacy, an abuse of consumer rights and a very very slippery slope in an already surveillance society.

It would not have protected children or anyone else, it would have rendered it MORE dangerous, because those involved in criminality could easily bypass or fool CSAM, and putting it even further underground would have made it much harder for the police and others to catch these people.

There was so much garbage posted by some supporting client side scanning, and now we have Apple's admission it was a bad bad idea. Yet Apple would have had this information all the time.

Was it more a question of giving this information after the furore about it.

Client side scanning would mean a backdoor available on equipment you've paid for, equipment you bought based on processing speeds, where those speeds were being usurped by client side scanning, and energy you've paid for being usurped by Apple and others.

Client equipment should be sacrosanct. By all means Apple and others can scan their own servers, but never should client side scanning be undertaken.

The absolute minority involved in criminality would easily circumvent client side scanning, leaving the innocent to be presumed guilty.
 
  • Like
Reactions: VulchR

Motorola68000

macrumors 6502
Sep 12, 2022
301
273
It is clear you do not understand how this feature works or even what hash-checking means.

These systems work like antivirus software. They scan files for matching hashes against a database of known child abuse material compiled by law enforcement agencies.

A child having explicit photos of girl/boy-friends is not going to be flagged because it is not CSAM being circulated within known pedophile rings online.

Let’s at least get our facts straight before arguing pros and cons of systems such as these.
its not the scanning its where the scanning takes place and client side scanning should never have been on the table.
 
  • Like
Reactions: VulchR

Motorola68000

macrumors 6502
Sep 12, 2022
301
273
Young looking adults wouldn't appear in a CSAM database. I don't know how much we have to repeat that your private pictures wouldn't be exposed
You draw the conclusion that CSAM would only ever be about child abuse, which in my opinion is totally wrong. Once a backdoor is set up, which client side scanning does represent, then end to end encryption becomes nonsensical, and every opportunity exists for bad players including governments to access whatever they like from personal devices. By all means Apple and others could scan THEIR equipment, but never should they contemplate usurping equipment of private individuals, usurping the processing speed, the energy and where it would be a very very bad day for computing if client side scanning had been implemented. So all this about child abuse misses the point, and I suggest most strongly that child abuse was merely a ploy to get client side scanning in without too much of a problem, but where industry experts were against it, understanding the dangers that many posters still don't
 
  • Like
Reactions: VulchR

Motorola68000

macrumors 6502
Sep 12, 2022
301
273
Which could just as easily be known photos of terrorist leaders, posters with terrorist slogans, known photos of drug paraphernalia, scans of subversive pamphlets... No, it shouldn't be able to detect your own photos (unless its a false positive) but there's no reason that the on-device algorithm needs to be modified in order to generate a hash from any photo which would match the hash of whatever images, on whatever subject, were on the list.

In any case, since when was this algorithm going to be open source and available for inspection?
Posters still seem to be stuck on CSAM, when that would be just the start, as even security experts made known.

its where the scanning was to take place that was the problem...YOUR PRIVATE EQUIPMENT. Just because it was originally stated it was for hoping to prevent child porn etc., does not mean that was it was actually intended to do, or would not be extended, and where it would not have protected children anyway, as any law enforcement officer knows these scum who prey on children are always one step ahead, and CSAM client scanning would have just made it harder for the authorities to track the culprits, but making the whole user base PRESUMED GUILTY.

"Blackstone's Ratio and h his famous quote: "It is better that ten guilty persons escape than that one innocent suffer."

Client side scanning would take it to another level where billions of people would be presumed guilty.
 
  • Like
Reactions: VulchR

Analog Kid

macrumors G3
Mar 4, 2003
8,985
11,739
There was someone that posted info about the company behind the CSAM database, that is has no transparency requirements and it’s a semi government company. They would just need to add the hashes of “problematic material” (EG: a pic depicting the US president as a clown”) to the database and send the updated version to Apple.
https://forums.macrumors.com/thread...t-csam-in-icloud-photos.2400202/post-32423927

No need to request to expand the feature as it’s made to work in any kind of pic and Apple wouldn’t know because they’re just hashes, no way to identify the pic they’re related to
And someone else replied to that and posted info from Apple on how they were planning to mitigate that risk that I will quote again here:

"The set of image hashes used for matching are from known, existing images of CSAM and only contains entries that were independently submitted by two or more child safety orga- nizations operating in separate sovereign jurisdictions. Apple does not add to the set of known CSAM image hashes, and the system is designed to be auditable. The same set of hashes is stored in the operating system of every iPhone and iPad user, so targeted attacks against only specific individuals are not possible under this design"​


The ability to do it on device that is the slippery slope part.

Even if governments can request either way. Requesting expansion can be easier than requesting Apple the build something completely new.

Continuing from the same source above:

"We have faced demands to build and deploy government-mandated changes that degrade the privacy of users before, and have steadfastly refused those demands. We will continue to refuse them in the future. Let us be clear, this technology is limited to detecting CSAM stored in iCloud and we will not accede to any government’s request to expand it."​

So this isn't a new thing for Apple, and I'm guessing the reason they went through the trouble of developing this whole system was to find a way to address demands from law enforcement while retaining as much user privacy as possible. They put "we will not accede to any government's request to expand it" in black and white, setting them up for a huge hit to reputation and value if they ever do.

Ok, so let's leave that aside and consider Apple equally untrustworthy. Then what?

Well, there are tons of scanning processes on our devices already that are way more beneficial to oppressive governments than this this crazy NeuralHash scheme. This scheme is designed to cast the narrowest net possible, looking only images that match specific known images. That doesn't sound useful to a government looking to root out subversives.

You phone already does a ton of AI scanning on you device: text scanning in your email, facial and text recognition in images, classification of images by general content and context categories, it knows your location and creates a database of important locations you visit frequently, has your entire contact list and calendar, it even analyzes your movements and behaviors to classify the type of behavior you are engaged in for fitness all the way down to when it should and shouldn't charge the device.

Not to mention if you don't trust Apple you have no way of knowing what else they're doing on your device.

So if you can't trust Apple, you're already pretty screwed and any one of those other methods is way more useful to a bad government than trying to slip enough hashes of images they consider subversive into a child protection function so that the people they're looking for match at least 30 of them and then trigger a manual review and reporting to an agency aimed at child safety.

Wouldn't it be easier to just force Apple to scan text and images for faces and phrases like "down with the dictator" and send them an encrypted blip when they get a hit?
 

Analog Kid

macrumors G3
Mar 4, 2003
8,985
11,739
Yeah, but I think that is because people specifically create the colliding images using gradient attacks, and having wildly different pictures with the same hash is very illustrative and attention grabbing. Visually similar images of different things are in my opinion the greater risk in the real world.

I agree with your second statement, visually similar images are the bigger risk in the real world, which is why I disagree with the first statement. I believe it would be more attention grabbing to spoof images that represent the greater risk.

Given the fact that everyone seems worried about this particular risk, it would be quite impactful for a researcher to measure the likelihood of it happening.

And these gradient attacks are interesting in showing how collisions can happen, but it's not the same to create an image that collides and to find an image that collides. These gradient attacks are proofs of concepts and the images they create aren't natural images. We make the leap from "if they can look that much like a real image, they can look more like one"-- but that's not necessarily true and not necessarily likely to happen 30 times on one device.

When we tested perceptual hashing methods, we ran them on millions of news media images and manually looked at clusters. It is very easy for a human to look at a page with 50 images and tell if they are all about the same or some are different, and indeed, often there would be one or two that should not be there. They would still be similar, as in my cow/tractor example where both images are mostly green (field) and blue (sky).

It's like saying sort algorithms perform badly on presorted arrays because quicksort does. You're talking about "perceptual hashing" as a broad category. How they are trained matters here. Perceptual hashing algorithms are designed specifically to cluster like images together-- but the training is what determines the definition of "like". What was it trained to look for as a similarity and what was it trained to discriminate between?

In the NeuralHash case it was trained to define "like" to be images that have gone through some fairly basic transformations-- crops, color shifts, etc. It was trained to discriminate between images outside those transformations.


The reverse engineered version of NeuralHash indicates it was nothing ground breaking as far as perceptual hashing goes. Maybe Apple's final version would have set new standards, but I think it is more likely it would have the same issues as other perceptual hashes.

I'd be interested in what papers you're looking at. When you say "same issue", if you mean there can be more than one image with the same hash, that's to be expected. The question is really how likely is a collision between two images that are legally different under child protection laws but still able to pass through a human review.
 
  • Like
Reactions: VulchR
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.