Notice: My personal stance on AI generated artwork. Retweet and share if you agree. Let us discuss, and not immediately scream bloody murder.

Now Viewing: Duplicate Images?
Keep it civil, do not flame or bait other users. If you notice anything illegal or inappropriate being discussed, contact an administrator or moderator.

Perpetual_Question - Group: Member - Total Posts: 1
user_avatar
Duplicate Images?
Posted on: 06/18/09 12:15AM

I do not see the point in duplicate images. Are they even aloud? I suppose they are, since over 700 images are tagged as such. I just though I would ask. If duplicates are not aloud, I can start flagging them when I stumble across them.



lozertuser - Group: The Fake Administrator - Total Posts: 2230
user_avatar
Posted on: 06/18/09 12:32AM

Yes, please flag them for deletion and include the ID it is a duplicate of. I am currently parsing a list of duplicated images as we speak using piespy's nifty iqdb program. We should be able to eliminate the duplicates from the first 450K images.



Ferryt - Group: Member - Total Posts: 127
user_avatar
Posted on: 06/18/09 09:50AM

We should be careful to make sure they are, indeed, duplicate images. Two images might look the same at first glance, even be flagged as duplicates by some duplicate detecting software, but they aren't.

Are the sizes of the images identical? If not, and they're very close (like an 800 x 600 image vs. a 780 x 585 image), which one do you keep. I'd opt for the larger one, myself.

If one is low rez and one high rez you're going to want to keep both.

If one has a watermark and one doesn't, then we need to determine whether the watermark is part of the original image and has been shopped out of the other one, or whether it was added by some pay site. I've seen instances where people removed a copyright notice, for example. I'd rather have that on the image I keep because it gives me the source (usually the artist).

If you have several with differing degrees of artifacts then we need to keep one of them, even though I know lozertuser doesn't want images with artifacts on this board. Sometimes that's all that is available, anywhere. We need, I think, to keep the one that looks best.

Filesize? I'm convinced that some images are packed with metadata, since I've seen some identically sized images before, one with an expected filesize and one that was huge by comparison. They looked exactly the same, but one would take up four to ten times as much space as the other, and both were in the same file format. I say pitch the "heavy" one. We don't know what else, besides image data, that stored in that thing, anyway.

There are probably other factors to consider, but I'm still working on my first cup of coffee and can't think of any.



Thref - Group: Member - Total Posts: 302
user_avatar
Software Does help
Posted on: 06/21/09 03:25AM





Ferryt, You are perhaps Unfamiliar with Just how Checksums work. Please consult Wikipedia for Insight and info on that matter. I Did once come across a few pics that were dupes but later realized that they weren't; Different Facial Expressions. There was also one that had a shadow and one that didn't. Two Profiles differing in context for another example.

I agree on the part about IF that is all that's available, and bigger being better if it retains the same or better quality.



rwx - Group: Member - Total Posts: 7
user_avatar
Posted on: 06/29/09 06:29AM

Ferryt said:
Are the sizes of the images identical? If not, and they're very close (like an 800 x 600 image vs. a 780 x 585 image), which one do you keep. I'd opt for the larger one, myself.
I'd choose the smaller one here, because from the dimensions it looks like it was the original and the slightly larger one was resized to fit the desktop and thus is lower quality.

If one is low rez and one high rez you're going to want to keep both.
Why keep redundant data? Delete the smaller one.

If one has a watermark and one doesn't, then we need to determine whether the watermark is part of the original image and has been shopped out of the other one, or whether it was added by some pay site. I've seen instances where people removed a copyright notice, for example.
If the original is watermarked I'd keep both, one as source and another as a clean image. Copyright notices belong in comments metadata block, not in the pixels.



crimsonsani - Group: Retired Staff - Total Posts: 13
user_avatar
Posted on: 07/01/09 12:00PM

Belle and I always check over res and quality before we delete. Obviously dupes with lower quality and resolution, if they're simply duplicates, will be tossed over higher res and quality.



Ferryt - Group: Member - Total Posts: 127
user_avatar
Posted on: 07/16/09 05:30PM

Thref, I know how checksums work. I also know how duplicate detectors work. Sure, you can use checksums to determine if two files are identical, but that won't tell you if two images in two different files where the images are the same are identical or not. In fact, even if they differ by only one pixel in the x and y dimensions in size they will be flagged as different, even if the larger one contains no additional content.

rwx, some of us prefer to keep larger versions of an image (assuming they're actually higher resolution, rather than just resized smaller images). The reasons is the higher resolution images contain more information. I can reduce a highrez image any time I want to without degrading the visual quality. I can't make a lowrez image bigger without doing doing it. On the other hand, some people have limited resources and don't want "absurdres" images. Also, there are CG sets in which the original images are given in a series of resolutions. If we're going to be archiving those, then we need to be willing to archive all of them.

Your comment regarding copyright notices is a personal opinion, not shared by most artists, and with good reason. Put it just in the metadata block and your image is not covered by International Copyright Law because the copyright notice is not visible on the image.



Thref - Group: Member - Total Posts: 302
user_avatar
Welcome Back Ferryt!
Posted on: 07/17/09 02:09AM










Ferryt Say:

In fact, even if they differ by only one pixel in the x and y dimensions in size they will be flagged as different, even if the larger one contains no additional content.


Yeah, that's the limit. But were banking on the fact that unless it's a copy and paste job with artifacts, they usually won't differ by one line or pixel and that is where the software helps. Better than no help at all.



Ferryt Say:

On the other hand, some people have limited resources and don't want "absurdres" images.


I thought Absurdres was the name of an artist. . .
Either Really good or Really bad artist. I forget.



your image is not covered by International Copyright Law because the copyright notice is not visible on the image.


Which Further Proves the uselessness of Metatags; Nobody really uses them, and Even if they do, Nobody really reads them anyway. And that is a Good Reason we Have Gelbooru :); for those who can't see the meta tags, but want the content nonetheless. I find it quite humorous.
I don't have a problem with Watermarks, I have a problem with Watermarks that obstruct the image that would otherwise be beautiful. Also, if idiots don't know how to implement them properly and end up artifacting the image to ruins just to get their watermark over the focal point of said image.



add_replyAdd Reply


1