Tag Archive | Data

What Happens to Photos Once Uploaded?

The following piece was written by OAS Communications Coordinator Ryan Bower for the Snapshot Wisconsin newsletter. To subscribe to the newsletter, visit this link.

Since Snapshot reached 50 million photos, the Snapshot team felt it was a good time to address one of the most asked questions about photos: what happens to photos once they are uploaded by volunteers? At first, the process seems complicated, but member of the Snapshot team, Jamie Bugel, is here to walk us through the process, one step at a time.

Bugel is a Natural Resources Educator and Research Technician at the DNR, but she works on the volunteer side of Snapshot. Bugel said, “I mainly help volunteers troubleshoot issues with their equipment or with their interactions with the MySnapshot interface. I am one of the people who answer the Snapshot phone, and I help update the user interface by testing functionality. There is also lots of data management coordination on the volunteer side of the program that I help with.”

Bugel listed off a few of the more common questions she and the rest of the Snapshot team get asked, including who reviews photos after the initial classification, what happens to the photos that camera hosts can’t identify and how do mistakes get rectified. “We get asked those [questions] on a weekly to daily basis,” said Bugel.

It Starts With a Three-Month Check and an Upload

Every three months, trail camera hosts are supposed to swap out the SD card and batteries in their trail camera. At the same time, volunteers fill out a camera check sheet, including what time of day they checked the camera, how many photos were on the SD card and if there was any equipment damage.

“You should wait at least three months to check their camera, because you won’t disturb the wildlife by checking more often. We want to view the wildlife with as minimal human interference as possible,” said Bugel. “At the same time, volunteers should check [their camera] at least every three months, because batteries don’t last much longer than three months. Checking this often is important to avoid missing photos.”

After the volunteer does their three-month check, they bring their camera’s SD card back to their home and enter the information on their camera check sheet into their MySnapshot account and upload their photos.

Bugel said it can take anywhere from 4 to 48 hours for the photos to appear in the volunteer’s MySnapshot account. Fortunately, the server will send an email when the photos are ready, so volunteers don’t have to keep checking. Volunteers can start classifying their photos after receiving the email.

A fisher walking through the snow

Initial Classification By Camera Hosts

The first round of classification is done by the trail camera hosts. The returned photos will sit in the Review Photos section of their MySnapshot account while the host classifies the photos as Human, Blank or Wildlife. The wildlife photos are also further classified by which species are present in the photo, such as beaver, deer or coyote.

This initial classification step is very important for protecting the privacy of our camera hosts, as well as helps on the back end of data processing. Over 90% of all photos are classified at this step by the camera hosts. When they are done classifying photos, they click “review complete,” and the set of photos is sent to the Snapshot team for the second round of classification.

Staff Review

The second round of classification is the staff review. Members of the Snapshot team review sets of photos to verify that all human or blank photos have been properly flagged. “For example, a deer photo may include a deer stand in the background. That type of photo will not go to Zooniverse because there is a human object in the photo,” said Bugel. Fortunately, nearly all human photos are taken during the initial camera setup or while swapping batteries and SD card, so they are usually clumped and easy to spot.

The second reason that staff review photos after the initial classification is for quality assurance. Since some animal species are tricky to correctly classify, someone from the Snapshot team reviews sets to verify that the photos were tagged with the correct species. This quality assurance step helps rectify mistakes. “Sometimes there are photos classified as blank or a fawn that are actually of an adult deer,” said Bugel. “We want to catch that mistake before it goes into our final database.”

In cases where the set of photos wasn’t classified by the camera host, the team will also perform the initial classification to remove human and blank photos. The Snapshot team wants to make sure any photos that reveal the volunteer’s identity or the location of the camera are removed before those photos continue down the pipeline.

Branching Paths

At this point in the process, photos branch off and go to different locations, depending on what classification they have. Blank (43%) and human (2%) photos are removed from the pipeline at this point. Meanwhile, the wildlife photos (20%) move on to either Zooniverse for consensus classification or move directly to the final dataset. The remaining photos don’t fall into one of our categories, such as the unclassified photos still awaiting initial review.

Photos of difficult-to-classify species, such as wolves and coyotes, are sent to Zooniverse for consensus classification. Bugel explained, “The photos [of challenging species] will always go to Zooniverse, even after volunteer classification and staff member verification, because we’ve learned we need more eyes on those to get the most accurate classification possible,” another layer of quality assurance.

Alternatively, photos with easy-to-classify species, such as deer or squirrel, go directly to the final dataset. Bugel said, “If a photo is classified as a deer or fawn, we trust that the volunteer correctly identified the species.” These photos do not go to Zooniverse.

A deer fawn leaping through

Zooniverse

Photos of difficult-to-classify species or unclassified photos move on to Zooniverse, the crowdsourcing platform, for consensus classification. “Wolf and coyote photos, for example, always go to Zooniverse, because it is so hard to tell the difference, especially in blurry or nighttime photos,” said Bugel.

The Snapshot team has run accuracy analyses for most Wisconsin species to determine which species’ photos need consensus classification. All photos of species with low accuracies go to Zooniverse.

On Zooniverse, volunteers from around the globe classify the wildlife in these photos until a consensus is reached, a process called consensus classification. Individual photos may be classified by up to eleven different volunteers before it is retired, but it could be as few as five if a uniform consensus is reached early. “It all depends on how quickly people agree,” said Bugel.

Team members upload photos to Zooniverse in sets of ten to twenty thousand, and each set is called a season. Bugel explained, “Once all of the photos in that season are retired, we take a few days break to download all of the classifications and add them to our final dataset. Then, a Snapshot team member uploads another set of photos to Zooniverse.” Each set takes roughly two to four weeks to get fully classified on Zooniverse.

To date, over 10,400 people have registered to classify photos on Zooniverse, and around 10% of the total photos have been classified by these volunteers on Zooniverse.

Expert Review

It is also possible for no consensus to be reached, even after eleven classifications. This means that no species received five or more votes out of the eleven possible classifications. These photos are set aside for later expert review.

Expert review was recently implemented by the Snapshot team and is the last step before difficult photos go into the final dataset. The team has to make sure all photos have a concrete classification before they can go into the final dataset, yet some photos never reached a consensus. Team members review these photos again, while looking at the records of how each photo was classified during initial review and on Zooniverse. While there will always be photos that are unidentifiable, expert review by staff helps ensure that every photo is as classified as possible, even the hard ones.

The Final Dataset and Informing Wildlife Management

Our final dataset is the last stop for all photos. This dataset is used by DNR staff to inform wildlife management decisions around the state.

Bugel said, “The biggest management decision support that Snapshot provides right now is fawn-to-doe ratios. Jen [Stenglein] uses Snapshot photo data, along with data from other initiatives, to calculate a ratio of fawns to does each year and that ratio feeds into the deer population model for the state.”

Snapshot has also spotted rare species too, such as a marten in Vilas county and a whooping crane in Jackson county. Snapshot cameras even caught sight of a cougar in Waupaca county, one of only a handful of confirmed sightings in the state.

The final dataset feeds into other Snapshot Wisconsin products, including the Data Dashboard, and helps inform management decisions for certain species like elk. Now that the final dataset has reached a sufficient size, the Snapshot team is expanding its impact by feeding into other decision-making processes at the DNR and developing new products. 

The Snapshot team hopes that this explanation helps clarify some of the questions our volunteers have about what happens to their photos. We know the process can seem complicated at first, and the Snapshot team is happy to answer additional questions about the process. Reach out to them through their email or give them a call at +1 (608) 572 6103.

An infographic showing how photos move from download to final data

May Science Update: Maintaining Quality in “Big Data”

Snapshot Wisconsin relies on different sources to help classify our growing dataset of more than 27 million photos, including our trail camera hosts, Zooniverse volunteers and experts at Wisconsin DNR. With all these different sources, we need ways to assess the quality and accuracy of the data before it’s put into the hands of decision makers.

A recent publication in Ecological Applications by Clare et. al (2019) looked at the issue of maintaining quality in “big data” by examining Snapshot Wisconsin images. The information from the study was used to develop a model that will help us predict which photos are most likely to contain classification errors. Because Snapshot-specific data were used in this study, we can now use these findings to decide which data to accept as final and which images would be best to go through expert review.

Perhaps most importantly, this framework allows us to be transparent with data users by providing specific metrics on the accuracy of our dataset. These confidence measures can be considered when using the data as input for models, when choosing research questions, and when interpreting the data for use in management decision making.

False-positive, false-negative

The study examined nearly 20,000 images classified on the crowdsourcing platform, Zooniverse. Classifications for each specie were analyzed to identify the false-negative error probability (the likelihood that a species is indicated as not present when it is) and the false-positive error probability (the likelihood that a species is indicated as present when it is not).

false_negative_graph

Figure 2 from Clare et al. 2019 – false-negative and false-positive probabilities by species, estimated from expert classification of the dataset. Whiskers represent 95% confidence intervals and the gray shading in the right panel represents the approximate probability required to produce a dataset with less than 5% error.

The authors found that classifications were 93% correct overall, but the rate of accuracy varied widely by species. This has major implications for wildlife management, where data are analyzed and decisions are made on a species-by-species basis. The graphs below show how variable the false-positive and false-negative probabilities were for each species, with the whiskers representing 95% confidence intervals.

Errors by species

We can conclude from these graphs that each species has a different set of considerations regarding these two errors. For example, deer and turkeys both have low false-negative and false-positive error rates, meaning that classifiers are good at correctly identifying these species and few are missed. Elk photos do not exhibit the same trends.

When a classifier identifies an elk in a photo, it is almost always an elk, but there are a fair number of photos of elk that are classified as some other species. For blank photos, the errors go in the opposite direction: if a photo is classified as blank, there is a ~25% probability that there is an animal in the photo, but there are very few blank photos that are incorrectly classified as having an animal in them.

Assessing species classifications with these two types of errors in mind helps us understand what we need to consider when determining final classifications of the data and its use for wildlife decision support.

Model success

When tested, the model was successful in identifying 97% of misclassified images. Factors considered in the development of the model included: differences in camera placement between sites; the way in which Zooniverse users interacted with the images; and more.

In general, the higher the proportion of users that agreed on the identity of the animal in the image, the greater the likelihood it was correct. Even seasonality was useful in evaluating accuracy for some species – snowshoe hares were found to be easily confused with cottontail rabbits in the summertime, when they both sport brown pelage.

bear_photo

Not only does the information derived from this study have major implications for Snapshot Wisconsin, the framework for determining and remediating data quality presented in this article can benefit a broad range of big-data projects.