Species Recognition Models
Wildlife populations are commonly monitored with audio and visual technology, like motion-triggered camera traps or microphones. Because analysing this data manually is labor-intensive, automation of the process using AI computer models is becoming increasingly popular. Addax provides customised machine learning models able to identify species of your choice, allowing you to efficiently manage and interpret large volumes of data.
Applications
Addax can develop custom computer vision models capable of automatically tagging camera trap images. Animals can be detected and identified to species level; people to poachers and non-poachers; vehicles to company and non-company cars—or any other groups you would like to distinguish. Custom identification models are also available for other applications, such as bird vocalisations or gunshot and chainsaw detection and localisation.
Training the Model
For a model to identify a particular species, it must first be trained on sample data of your target species or group, like images or audio recordings. Besides camera traps, it is also possible to develop models for images captured by handheld, drone, underwater or thermal cameras. The amount of data you will need to supply depends on many factors, including the distinctiveness of the target species, the backgrounds and the project setups. In general, the more training data provided, the more reliable the model will be. However, don’t worry if you suspect having too few images. Addax offers several methods to enlarge your dataset. For example, by leveraging existing models to extract information from bulk folders or filtering training data from similar ecological projects. Addax has access to over more than 20 million camera trap images, covering 850 species and various other taxonomic levels.
Deployment options
These customised AI models can be integrated into existing dataflows, set up for continuous monitoring with real-time notifications, or used through our open-source AI platform, EcoAssist, designed to facilitate image recognition and analysis for camera trap data—a tool developed by Addax Data Science to support open-source projects in nature conservation. This platform runs models locally, so there is no need for an internet connection. Results can take the form of a spreadsheet file, visualised boxes, crops, or collections of images in subfolders based on their detections. It is also possible to further analyse the model’s predictions in Timelapse – a commonly used image analyser for camera traps. Additional features can be tailored to your needs. However, please note that AI is never perfect: just like human identifiers, you should not expect 100% reliable results.
Reach out if you have any questions regarding the process or how Addax can help maximise efficiency through automated species identification.
Frequently Asked Questions
How should I prepare my images for training a species identification model?
The accuracy of a species recognition model largely depends on the quality of the dataset it was trained on. In computer vision, there’s a well-known saying: “garbage in equals garbage out.” This means that a low-quality dataset will result in a low-quality model. Therefore, investing time and effort into creating a high-quality dataset is essential for achieving the best results.
Image variability
Species recognition models are trained on labelled images from the project area in which they will be deployed. By repeatedly analysing these examples, the model learns to identify species effectively. A good rule of thumb is to provide at least 10,000 images per class. While more images per class are always beneficial, the returns diminish as the quantity increases.
What matters most is the variability of the images. For example, 5,000 images from 100 locations are far more valuable than 10,000 images from a single location. To develop a robust model, the dataset should be as diverse and heterogeneous as possible, incorporating different locations, backgrounds, camera types, angles, weather conditions, habitats, etc.
However, creating a perfect dataset may not always be feasible in real-world scenarios. For classes with limited data, don’t worry—Addax will work with whatever you have. When necessary, the dataset might be supplemented with images from other ecological studies. While these external images are less effective than project-specific data, they still contribute to improving the model’s accuracy.
Image tagging
As the expert on your region’s wildlife, you are best suited to annotate the data. Addax essentially requires information about which animal appears in each image or video. This information can be provided in various ways. In many cases, organisations have been labelling their images for years and already have a preferred method. That’s great! If you still need to label or organise your data, below are some points to consider. Applications like ZIP-classifier are excellent for tagging images efficiently. They allow you to organise and label large datasets effectively and prepare them for model development. Below are two common methods of labelling your data.
- Tagging by folder structure - Organise your images into folders based on species. Each class should have its own subfolder (see example below). Ideally, the folder structure in each class subfolder should be unaltered (read more about why at 'original folder structure').
- Tagging with spreadsheet metadata - Alternatively, you can provide some metadata file (e.g., XLSX, CSV, TXT, or JSON) with image or video file names and their corresponding species tags. Addax will then handle the folder organisation programmatically.
Original folder structure
It is important to keep as much information about the image as possible. For example, maintaining the original folder structure helps Addax understand which images belong together and which are independent. This distinction is crucial for creating proper train, test, and validation splits during model training. While not absolutely essential, this step significantly enhances the model's accuracy.
For example, a folder structure like <organisation>/<project>/<site>/<deployment>/<image> provides valuable information for model development. Alternatively, other hierarchical structures, such as <area>/<park>/<camera>/<image>, work just as well. The key is to organise the data from large groupings to smaller ones.
Luckily, most conservation agencies already have their own way of structuring their data, so there’s usually no need to alter any existing folder structures. Please note that Addax does not require any latitude or longitude information. If needed, you can safely remove any location metadata.
Multi-species occurrences
Images containing multiple species can present challenges during data processing. Addax’s annotation method will label all animals in an image with the tag provided by the client. For instance, the image below contains both ostriches and gemsboks in the background. If this image was labelled “ostrich”, all individual animals, including the gemsboks, will receive the “ostrich” label.
In most cases, multi-species occurrences are rare and not a major concern. However, they can create issues in specific contexts, such as studies conducted in pastures with domesticated animals or cameras monitoring watering holes. A few misclassifications are manageable and won’t significantly affect the model’s performance. However, frequent errors will impact accuracy. If you are aware of such images in your dataset, please provide a list of the affected images, deployments, or locations to help mitigate potential issues.
Please note that the resulting model will be capable of detecting multiple species in a single image. Only during the training phase, multi-species occurrences can pose a challenge and are best excluded from the dataset or kept separate.
Duplicate folders
Duplicates in the dataset can negatively impact model accuracy. If duplicate folders or images are present, they might inadvertently end up in different data splits (train, validation, or test). This increases the risk of the model becoming overconfident, which reduces its ability to generalise effectively. While a few duplicates won’t pose a significant problem, many of them can hinder the model’s learning process. Ensuring the dataset is free of duplicates is an important step to maintain its quality and improve the overall performance of the model.
More questions?
If you have further questions or need additional help with labelling, feel free to reach out.