deeplearninginference.app


#Deep LearnIng Inference | Applications


#PEKAT VISION | Industrial visual inspection and quality assurance


#Synthesized | Labelled high-quality data generation for fraud detection


#UC Berkeley | Center for Targeted Machine Learning and Causal Inference


#DAWNBench | Deep Learning Benchmark


#MIT | Sensing, Learning & Inference Group - CSAIL


#Levatas | Builds end to end AI solutions, machine learning models, and human in the loop systems automating visual inspection | Teamed with Boston Dynamics


#Arm | Cortex X-3 | Focus on enabling artificial intelligence and machine learning-based apps


#3DFY.ai | 3DFY Prompt | Generative AI that lets developers and creators build 3D models based on text prompts


#Deutsche Bank | Artificial intelligence to scan wealthy client portfolios


#Morgan Stanley | Experimenting Artificial intelligence


#Google | TinyML | Applying artificial intelligence to edge devices | Machine learning framework running on low-power, resource-constraint, low-bandwidth edge devices |32-bit microcontrollers | Digital signal processors | TensorFlow Lite | TinyML applications of machine learning with tiny footprint of a few kilobytes within embedded platforms having ultra-low power consumption, high (internet) latency, limited RAM, and flash memory | Running on Android, iOS, Embedded Linux, and microcontrollers | Applying machine learning and deep learning models to embedded systems running on microcontrollers, digital signal processors, or other ultra-low-power specialized processors | Running for weeks, months, or even years without recharging or battery replacement | IoT devices | Running for 24×7 hours | Machine learning model executed within edge devices without any need of data communication over a network | Communicating only results of inferences to network | Deep learning framework using recurrent neural networks (RNN) for machine learning | Training for models is batch training in offline mode | Selection of dataset, normalization, underfitting or overfitting of the model, regularization, data augmentation, training, validation, and testing is already done with the help of a cloud platform like Google Colab | Once ported to embedded system, model has no more training. it consumes real-time data from sensors or input devices and applies model to that data | Arduino Nano 33 BLE Sense | SparkFun Edge | STM32F746 Discovery kit | Adafruit EdgeBadge | Adafruit TensorFlow Lite for Microcontrollers Kit | Adafruit Circuit Playground Bluefruit | Espressif ESP32-DevKitC | Espressif ESP-EYE | Wio Terminal: ATSAMD51 | Himax WE-I Plus EVB Endpoint AI Development Board | Synopsys DesignWare ARC EM Software Development Platform | Sony Spresense | Image classification | Object detection | Pose estimation | Speech recognition | Gesture recognition | Image segmentation | Text classification | On-device recommendation | Natural language question answering | Digit classifier | Style transfer | Smart reply | Super-resolution | Audio classification | Reinforcement learning | Optical character recognition | On-device training


#KoBold Metals | AI to mine rear earth metals


#DEEPX | Neural Processing Unit (NPU) for IoT devices | Deep learning accelerators | Software framework for development and deployment of deep learning models for mass market uses | Device AI integration into edge devices | Code generation for DEEPX NPU | SDK quantizer converts trained models from FP32 bit to INT8 or less bit integer representation | Optimizer fuses operators or exchanges the order of operators | Runtime API supports commands for model loading, inference execution, passing model inputs, receiving inference data, and a set of functions to manage the devices | Smart Camera Sensors | Machine Vision | Smart Mobility | Drone | Edge Computing | Smart Building | Smart Factory


#UnitX | Deep learning inspection system for automated manufacturing


#TSMC | AI chips | GPUs for Nvidia | Parallel computations to train AI models


#CoreWeave | GPU capacity via cloud | GPU accelerated compute to large scale users of artificial intelligence, machine learning, real time rendering visual effects, and for life sciences | Nvidia CUDA software programming platform ecosystem | Customers and developers building on that ecosystem with software tools and libraries, leveraging everyone else prior work | Nvidia InfiniBand | GPUs to serve inference, the process of generating answers from AI models


#Avnet | Battery powered sensing systems | Applications processors | GPU | NPU | Power consumption


#NVIDIA | Jetson Orin | Embedded AI platform | NVIDIA Ampere architecture GPU at its core | Deep Learning Accelerator (DLA) for deep learning workloads | Deep learning recommendation models | Programmable Vision Accelerator (PVA) engine for image processing and computer vision algorithms | Multi-Standard Video Encoder (NVENC) | Multi-Standard Video Decoder (NVDEC) | Deep learning operations like convolutions much more efficiently than CPU | Autonomous driving solution stack | Object detection as part of the perception stack | Proximity segmentation | Robotics Platform Software | DeepStream SDK


#AI21 Labs | Generative text AI | Large language models | Instruction models | Custom models | Multilingual support


#Edge AI and Vision Alliance | Edge AI | Visual AI | Perceptual AI


#NLP Cloud | AI models | Privacy focused | LLaMA 2 | AI models available at the edge and on premise | Fine tuning client AI with client data


#OpenAI | | GPT | ChatGPT


#Meta | LLaMA


#Stability AI | Stable Diffusion model


#Sensor Cortek | Training metrics: accuracy, precision, false positives, false negatives, F1 score | Analyzing confusion matrices | Identifying class interaction issues


#Groq | Language Processing Inference Engine | Language Processing Unit (LPU) | AI language applications (LLMs) | Sequences of text generated fast | Groq supports standard machine learning (ML) frameworks PyTorch, TensorFlow, and ONNX for inference | GroqCloud | Groq Compiler


#Anybotics | Workforce App | Operate ANYmal robot from device | Set up and review robot missions | Industrial Inspection


#IDS | Industrial image processing | 3D cameras | Digital twins can distinguish color | Higher reproducible Z-accuracy | Stereo cameras: 240 mm, 455 mm | RGB sensor | Distinguishing colored objects | Improved pattern contrast on objects at long distances | Z accuracy: 0.1 mm at 1 m object distance | SDK | AI based image processing web service | AI based image analysis


#NVIDIA | SDK for high-performance deep learning inference | Deep learning inference optimizer and runtime | Quantization | Layer and tensor fusion | Kernel tuning | INT8 using quantization-aware training and post-training quantization | Floating point 16 (FP16) optimizations for deployment of deep learning inference video streaming, recommendations, fraud detection, and natural language processing | Reduced-precision inference minimizes latency required for real-time services, autonomous and embedded applications | Python API for defining, optimizing, and executing LLMs for inference in production | Integrated with PyTorch and TensorFlow | C++ API


#National Technical University of Athens | MariNeXt deep-learning framework detecting and identifying marine pollution | Sentinel-2 imagery | Detecting marine debris and oil spills on sea surface | Automated data collection and analysis across large spatial and temporal scales | Deep learning framework | Data augmentation techniques | Multi-scale convolutional attention network | Marine Debris and Oil Spill (MADOS) dataset | cuDNN-accelerated PyTorch framework | NVIDIA RTX A5000 GPUs | NVIDIA Academic Hardware Grant Program | AI framework produced promising predictive maps | Shortcomings: unbalanced dataset, marine water and oil spills are abundant, foam and natural organic material are less represented


#Neptune Labs | neptune.ai | Tracking foundation model training | Model training | Reproducing experiments | Rolling back to the last working stage of model | Transferring models across domains and teams | Monitoring parallel training jobs | Tracking jobs operating on different compute clusters | Rapidly identifying and resolving model training issues | Workflow set up to handle the most common model training scenarios | Tool to organize deep learning experiments


#SIMa.ai | Platform for edge AI | Enabling edge-ML application-pipeline acceleration on single chip | MLSoC accelerates deep learning workloads | ModelSDK forvmodel loading, quantization and compilation in Python | Python APIs and VSCode remote execution for rapid application development and prototyping | GStreamer plugins enabling high-performance video processing pipelines | ML Pipeline Package for application deployment


#DeepLearning.AI | Courses in Deep Learning | AI Courses


#UCLA | AI model analyzing medical images of diseases | Deep-learning framework | SLice Integration by Vision Transformer (SLIViT) | Analyzing retinal scan, ultrasound video, CT, MRI | Identifying potential disease-risk biomarkers | Using novel pre-training and fine-tuning method | Relying on large, accessible public data sets | NVIDIA T4 GPUs, NVIDIA V100 Tensor Core GPUs, NVIDIA CUDA used to conduct research | SLIViT makes large-scale, accurate analysis realistic | Disease biomarkers help understand disease trajectory of patients | Tailoring treatment to patients based on biomarkers found through SLIVIT | Model largely pre-trained on datasets of 2D scans | Fine-tuning model on 3D scans | Transfer learned model to identify different disease biomarkers by fine-tuning on datasets consisting of imagery from very different modalities and organs | Trained on 2D retinal scans and then fine-tuned model on MRI of liver | Helping model with downstream learnings even though different imagery domains


#Linux Foundation | LF AI & Data | Fostering open source innovation in artificial intelligence and data | Open Platform for Enterprise AI (OPEA) | Creating flexible, scalable Generative AI systems | Promoting sustainable ecosystem for open source AI solutions | Simplifying the deployment of generative AI (GenAI) systems | Standardization of Retrieval-Augmented Generation (RAG) | Supporting Linux development and open-source software projects | Linux kernel | Linus Torvalds


#Tampere University | Pneumatic touchpad | Soft touchpad sensing force, area and location of contact without electricity | Device utilises pneumatic channels | Can be used in environments such as MRI machines | Soft robots | Rehabilitation aids | Touchpad does not need electricity | It uses pneumatic channels embedded in the device for detection | Made entirely of soft silicone | 32 channels that adapt to touch | Precise enough to recognise handwritten letters | Recognizes multiple simultaneous touches | Ideal for use in devices such as MRI machines | If cancer tumours are found during MRI scan, pneumatic robot can take biopsy while patient is being scanned | Pneumatic device can be used in strong radiation or conditions where even small spark of electricity would cause serious hazard


#Cyclopharm | Nuclear medicine device | Patient breathing superheated radioactive, gas-like substance | Better lung imaging.| Detected pulmonary embolisms (blood clots) | Visualization of pulmonary ventilation | Chronic obstructive pulmonary disease (COPD) | Asthma | Long Covid.| Dry-carbon nano-particles irradiated with isotope technetium-99 | Particles are 150 nano-metres | Heating carbon crucible to 2,700 degrees Celsius | Only three to four breaths required | Gas works as imaging agent | Computed tomography (CT) camera | Nanoparticles have six-hour radioactive life | Technegas images fine alveoli, where oxygen and blood mix | Technegas suitable to pregnant, to those having poor kidney function or who are allergic to imaging agents | Device is fully reimbursed, stand-alone | Technegas generator


#University of Texas Southwestern Medical Center | AI tool analyzes time-series MRIs and clinical data to identify metastasis | Providing crucial, noninvasive support for doctors in treatment planning | Helping patients avoid unnecessary surgery and improve outcomes | Four-dimensional convolutional neural network (4D CNN) | Model trained using dynamic contrast-enhanced MRI (DCE-MRI) | Clinical datasets used from 350 women recently diagnosed with breast cancer that spread to lymph nodes | Researchers used Nucleus Compute Cluster | Researchers built and trained 4D deep learning model employing NVIDIA A100 Tensor Core and NVIDIA V100 Tensor Core GPUs | AI model processes data in 4D, examining data from 3D MRI scans while accounting for changes over time | Model learns ‌features of tumors and nearby lymph nodes by analyzing multiple images over time | Model integrates clinical data such as age, tumor grade, and breast cancer markers | Accurately identifying patterns associated with cancer-free or cancer-affected lymph nodes | Decreasing need for additional imaging and reducing number of invasive procedures for patients | Preventung breast cancer patients from undergoing unnecessary sentinel node biopsies, and axillary lymph node dissection (ALND), reducing risks, complications, and resources associated with procedure


#NVIDIA | GH200 Grace Hopper Superchip | Boosting TTFT in multiturn user interactions | Building contextual understanding of input sequence | Retrieval Augmented Generation (RAG) | Converting user prompt to tokens and then to highly dense vectors | Dot product operations | Building mathematical representation of the relationship between all tokens in prompt | Operations repeated across different layers of the model | Generating key-value cache (KV cache) | Reusing KV cache | KV offloaded from GPU memory to higher capacity and lower cost CPU memory | KV cache reloading back to GPU memory | KV cache offloading, eliminates need to recompute KV cache without holding up valuable GPU memory | Enabling multiple users to interact with the same content without recalculating KV cache for each new user | Speeding code generation in integrated development environments (IDEs) that have LLM capabilities, one or more developers can submit multiple prompts interacting with single code script over extended periods of time, offloading initial KV cache calculations onto CPU memory and then reloading it for subsequent interactions avoids repeated recalculations and saves valuable GPU and infrastructure resources | NVIDIA TensorRT-LLM | Llama 3 70B model running on server with NVIDIA H100 Tensor Core GPUs connected through PCIe to x86 host processor | Compelling deployment strategy for data centers


#Allen Institute for Artifical Intelligence | Robot planning precise action points to perform tasks accurately and reliably | Vision Language Model (VLM) controlling robot behavior | Introducing automatic synthetic data generation pipeline | Instruction-tuning VLM to robotic domains and needs | Predicting image keypoint affordances given language instructions | RGB image rendered from procedurally generated 3D scene | Computing spatial relations from camera perspective | Generating affordances by sampling points within object masks and object-surface intersections | Instruction-point pairs fine-tune language model | RoboPoint predicts 2D action points from image and instruction, which are projected into 3D using depth map | Robot navigates to these 3D targets with motion planner | Combining object and space reference data with VQA and object detection data | Leveraging spatial reasoning, object detection, and affordance prediction from diverse sources | Enabling to generalize combinatorially.| Synthetic dataset used to teach RoboPoint relational object reference and free space reference | Red and ground boxes as visual prompts to indicate reference objects | Cyan dots as visualized ground truth | NVIDIA | | Universidad Catolica San Pablo | University of Washington