A publication by A publication by JULY 2023 August 2024 Big News Ahead: Read Ralph's Letter! Best Oral and Best Poster MIDL 2024
Computer Vision News Computer Vision News 2 Dear reader, after 8 extraordinary years and 200 issues encompassing over 7,000 pages of cutting-edge scientific content, RSIP Vision is transitioning out of publishing Computer Vision News to focus on new ventures. This beloved magazine is now seeking a new custodian to continue its legacy and expand its horizons. Could your organization benefit from this opportunity? Opportunities and Benefits: • Engaged Community: Reach 9,000 active subscribers specializing in Computer Vision and AI. • Prestigious Partnerships: Ongoing collaborations with IEEE, the Computer Vision Foundation, and CVPR for three major conferences every year. • Global Reach: Over 1 million pageviews annually, attracting readers worldwide. • Communication Powerhouse: An established platform for engaging the CV/AI community with unlimited potential. Why Consider This Opportunity? The cost of acquisition? Zero! I will bring with me the entire package: archives, subscriber base, publishing platform, contact lists, and all established processes. This is a unique opportunity to bolster your company's communication strategy and visibility in the tech community. Interested in Learning More? Let’s discuss how Computer Vision News can serve your company’s communication needs. Contact me directly ☺ Ralph Anzarouth Editor, Computer Vision News editor@computervision.news A New Home for Computer Vision News
NB: I will personally continue to be the sole and dedicated guardian of all subscribers' email addresses, ensuring they remain secure and protected. Reader's privacy is paramount, and your information will never be shared or sold. Computer Vision News will continue to uphold its editorial independence and commitment to providing high-quality, unbiased content to our community. Your trust is paramount, and we are dedicated to maintaining it during and after this transition. 3 Computer Vision News Computer Vision News Exciting Changes Ahead: Ralph's Vision Enthusiasm is common, endurance is rare!
Computer Vision News Computer Vision News 4 MIDL Best Oral Paper Award Winner SINR: Spline-enhanced implicit neural representation for multi-modal registration Deformable image registration aims to align two images by determining a transformation for every pixel rather than a global transformation, such as rotation or translation. Recovering these transformations at a granular level is particularly useful in medical scenarios. For instance, in cardiac imaging, different time points of the heart beating could be registered to detect abnormalities. In this paper, Vasiliki is working with brain images. “Registration is challenging when you want to recover the transformation at every point in space because there is more than one possible solution,” she explains. “We don’t have any ground truth for this, so we don’t know what we’re looking for. As a result, we have to do this fully unsupervised. We have to do this in a way that we have surrogate Vasiliki Sideri-Lampretsa is a PhD student in Daniel Rueckert’s AI in Medicine Lab at the Technical University of Munich. Fresh from winning the Best Oral Award at MIDL 2024, Vasiliki speaks to us about her novel approach to deformable image registration and its significant promise for clinical practice.
5 Computer Vision News Computer Vision News SINR: Spline-enhanced implicit neural … measures to measure the performance. Also, we don’t really know what a possibly good result is in the end.” To tackle these challenges, she takes a novel approach using implicit neural representations (INRs). These multi-layer perceptron networks with periodic activation functions or positional encodings offer a compressed and continuous representation of signals. This approach is helpful for registration because it allows for recovering transformations at different resolutions, including at sub-pixel levels. The inspiration for this approach came from prior work by Wolterink et al. at MIDL 2022, which applied INRs to lung registration. However, adapting this method to brain images revealed two major issues. “The first was that the transformation is smooth because inter-brain registration is more complicated than lung registration,” Vasiliki points out. “Then we saw that using these INRs, we have way more spatial folding, meaning that, for example, the space is in a state that it’s not usable anymore or produces illegal transformations, and then, the final result is not good enough. The second major thing we saw was that the way sampling of the coordinates was done, we weren’t able to do multi-modal registration with mutual information.” Vasiliki tells us these issues were somewhat expected, given that the registration field is progressing more slowly than other fields in
Computer Vision News Computer Vision News 6 “…a novel approach using implicit neural representations (INRs). These multi-layer perceptron networks with periodic activation functions or positional encodings offer a compressed and continuous representation of signals!” medical imaging, such as segmentation or super-resolution. Could this be why the work caught the eyes of the judges at MIDL? “I think the motivation of the work was clear, and the solution was easy to grasp,” she considers. “We saw other interesting works at MIDL, but this was the major difference. It was probably easy for everybody to understand, and the presentation was good, so that’s why we won.” The study’s success is also a testament to the collaborative efforts and mentorship within the research group, including from Vasiliki’s main supervisor, Daniel Rueckert, and Magdalini Paschali, both dear friends of this magazine. “Daniel is an expert in registration,” she tells us. “Back in the day, he worked a lot in the registration and segmentation field, and I think this expertise from the pre-deep learning era is valuable. All these old papers and ideas can become new if you treat them the right way.” Vasiliki met Magda during her master’s thesis while working with Walter Simson in Nassir Navab’s group. “Magda is very good at organizing ideas,” she says. “Taking an idea from A and bringing it to B. That’s very useful because, at some point in a project, you don’t know where you are and need somebody to put everything in the right place.” Vasiliki was born in Greece and studied there for her bachelor’s and first master’s before moving to Germany for her second master’s. Reflecting on her journey, she highlights the supportive research environment in Germany, where she found people with similar interests and an easier environment in which to work. “In Greece, the situation is MIDL Best Oral Paper Award Winner
7 Computer Vision News Computer Vision News not that easy if you want to do research because of funding and because people have so much competition to get so little money,” she reveals. “This can lead to a toxic environment sometimes.” As our time together draws to a close, Vasiliki thanks us for this opportunity and invites readers to delve into her paper to find out more, adding: “Reach out to me - that would be fantastic!” SINR: Spline-enhanced implicit neural …
Computer Vision News Computer Vision News 8 Anna Maria Wundram (top) is a master’s thesis student, and Paul Fischer (right) is a PhD student under the supervision of Christian Baumgartner. Their innovative approach to glaucoma diagnosis took home the Best Poster Award at MIDL 2024 last month. Anna and Paul are here to tell us more about their impressive research. Leveraging Probabilistic Segmentation Models for Improved Glaucoma Diagnosis: A Clinical Pipeline Approach MIDL Best Poster Award Winner
9 Computer Vision News Computer Vision News Leveraging Probabilistic Segmentation … In this work, Anna and Paul focus on the uncertainty involved in segmenting two areas in the eye: the optic cup and the optic disc. Segmenting these areas is crucial for diagnosing glaucoma, where the optic cup, a smaller part within the optic disc, enlarges. “Even experts very often disagree when they’re segmenting these areas,” Anna tells us. “We wanted to model this uncertainty with a probabilistic model and then propagate this learned uncertainty through a pipeline that is very close to clinical practice to then ultimately predict glaucoma for a patient.” With this probabilistic modeling of the uncertainty, the team compares different probabilistic models on how well they perform. They also propose a new feature to predict glaucoma extracted from the segmentation: the rim thickness curve (RTC). The RTC is the distance between the disc and the cup for every point. As the optic cup is bigger in glaucoma patients, a thinner rim suggests glaucoma. While doctors use various diagnostic tools, this method is an excellent first screening approach. From a machine learning perspective, predicting glaucoma directly from images might seem more straightforward; however, the team found that this pipeline, which mirrors clinical workflows, performs better. “The challenge here was to find a way to retain performance but also make it more accessible to clinicians so that they trust the predictions because it’s close to how they work,” Paul explains. “That’s the challenge we faced at the beginning of this problem. How do we get doctors to trust this mechanism that we have?”
Computer Vision News Computer Vision News 10 Anna adds: “Clinicians disagree so much on where exactly they want to segment these two areas. This is why it’s very important in this case for doctors to trust what the machine is doing and to introduce and model this uncertainty, so they don’t just have a black box where you have an input image and the output, but they have this pipeline, which is more insightful, and they have this uncertainty output in the end!” The probabilistic nature of the model acknowledges the inherent uncertainties in medical segmentation and diagnosis. It is important to model multiple possibilities so the doctor can do a second check and choose which segmentation to use. You get the best of both worlds with doctors and machines working together. Although this work is focused on medical imaging, the general applicability of the segmentation framework and leveraging uncertainty could extend beyond medical imaging. Segmentation is crucial in various fields, such as autonomous driving scenarios. The problem is strongly clinically inspired but not restricted to it. Winning the Best Poster award was a moment of pride for the team. “Of course, we’re all very happy!” Anna smiles. “We’re very grateful for the recognition of our work and happy that the way we presented it was easy to follow.” MIDL Best Poster Award Winner You get the best of both worlds with doctors and machines working together
11 Computer Vision News Computer Vision News Paul attributes their success to the relatable nature of the problem. “It was easy to grasp because it comes directly from clinical practice,” he points out. “To understand our contribution and the problem in the first place, you don’t need to know the technical details of each step, but the bigger picture is very relatable and understandable. That’s why I think it was a bit easier to present it to people so that they understand. Sometimes, when I’m listening to poster presentations, they’re hard to follow because they’re just so technical. We had this high-level problem, and you could communicate it more easily to people.” Looking ahead, Anna says that while these models are expressive and show the desired uncertainty, sometimes they tend to be overconfident in their predictions, which is a key improvement area. “That’s what we’ve noticed, especially with the betterperforming models,” she reveals. “If we could fix that and adjust this confidence a bit, that would be amazing!” While other works highlight uncertainty as just one part of the picture alongside other things, their work focuses on incorporating it into actual clinical practice and prompts others to explore similar ideas. “Usually, if you ask people, they agree uncertainty is important, but what do we actually do with it?” Paul asks. “This is what we showed in this paper. Maybe other people will come up with nice ideas, and I hope this contributes to that.” Look out for more papers from the team at MICCAI in October. Anna and Paul are keen to emphasize the collaborative nature of their research, working not only as computer scientists but also closely with the clinicians who initially identified the challenges in cup and disc segmentation. “Without the clinicians, without getting what they’re actually struggling with in practice, we wouldn’t have had this idea,” Paul stresses. “I think it’s very important to mention the clinical side of this as well!” “… the bigger picture is very relatable and understandable!” Leveraging Probabilistic Segmentation …
Read 160 FASCINATING interviews with Women in Science Read 160 FASCINATING interviews with Women in Science
13 Computer Vision News Computer Vision News Alice Lucas Alice Lucas is a computer vision engineer at Meero in Paris. What do you do at Meero? I work in a team of developers. We’re what we call a feature team, where we try to deploy features to clients as fast as possible. My specific role is to develop and deploy computer vision algorithms that are going to serve our customers’ needs. Specifically, we work on a product for real estate agents. Those are our target clients. We’re going to automatically enhance and edit their images such that they are more sellable. Is that what you wanted to do, or is that the thing that found you? Well, I guess the product part is the thing that found me. [laughs] I don’t have any particular love for real estate, but we work on algorithms such as image enhancement and editing, which since the PhD has been the family of algorithms that I’ve worked with. Hosting someone who has not gone down the typical academic path is very refreshing. What is your story? At first, it was kind of traditional, I guess. Bachelor, then combined master’s and PhD at Northwestern. I mean, I do love research, and I tend to romanticize academia a little bit. I enjoyed being an expert in this one field and really knowing what I’m talking about and learning about something every day and writing and all of that, but I also do have this drive to get things done. Even if I think about my PhD work, sometimes it has this feeling it’s a bit of incremental work. The competition aspect is not something I’m that comfortable with. The whole number of citations and the conferences, it’s something that’s a bit overwhelming, even if I have the love for what I do. It’s just a nicer feeling to be able to be in an industry setting and just deploy things and really have your thing being used by the client, getting client feedback, and iterating over that. I kind of like that! Is it not also very competitive to work with clients? You might have competitors who promise to do a better job than you. It is. I guess maybe it’s because it’s less individualistic. You know, I’m part of a team. I feel like there’s this team effort. It’s not just me; it’s the other developers, also. I have two other computer vision engineers working with me, so the three of us, maybe there’s this like collective aspect that makes us feel stronger against the competition. Now, Alice, we want to know your citation number! [laughs] Go check it out on Google
Computer Vision News Computer Vision News 14 Women in Computer Vision Scholar. I’ll leave that to you! Have you ever had any second thoughts about the route you have taken? Well, every time I’m sitting down and reading a paper, or reading about something new in the field, or we went to a conference as part of my company just last year, and seeing all the exciting stuff that’s happening, especially right now, in machine learning and AI, it makes me want to go back, but this is maybe more of a ‘the grass is always greener’ type of effect where you think that somewhere else is always better. In my job, the nice thing with industry is, sure, there’s practical research, but the research is still there. You still get to have your journal club, you still get to talk about the advancements in science, you still get to be a part of it, just maybe not at the center of it, but you’re still learning every day, which is honestly all I care about at the end of the day. As an academic researcher, the future seems slightly more obvious: have a lot of students, help them, promote them, and advance science in one way or another. What are your objectives until the end of your career? I guess I want to be a CTO at a company. If I’m already on that path, I might as well do it all, right? [she laughs] We’ll see what happens. It cannot be a goal by itself to be a CTO. You must want to do something. Well, yeah, for sure, but even if I
15 Computer Vision News Computer Vision News think about the career path with academia, the one you just mentioned about having students and mentoring students, I think that would also have been great. I really do think there are just many different parallel paths that life can provide to you, and it kind of always works out. Whether it’s industry or academia, I think you can always find a way to make yourself happy. Some pillars of this community, like Yann LeCun, are champions of double affiliation. He is at the same time a professor at NYU and the leader of Meta AI. Those are two big responsibilities, and he is very serious about both of them. Many people do it. What do you think of that? Yeah, I like it. It sounds to me like this could be the best of both worlds, right? Not only do you have something that’s concrete and applying, and you participate a lot in the industry, but you’re also super close to the actual core research. Should we infer that Alice is available for a tenure track at some great university? Let’s do it. Let’s consider it! [she laughs] I heard you also went abroad for a few years. Yeah, for more than a few years. I was born and raised in France, but then all of my higher education, past high school, so university, master’s, and PhD, I was in the US for that. Then I did two years at the Broad Institute. Why did you make the brave choice to move far away from your nest? Hmm that’s a good question. Thank you. [Alice laughs] It’s interesting because people tell me that, but it did not feel brave at all at the time. It just felt like the one thing to explore. “I do love research, and I tend to romanticize academia a little bit!” Alice Lucas
Computer Vision News Computer Vision News 16 Women in Computer Vision That’s brave. Any bird that quits its nest is brave. Right. Even if it has had its challenges, I don’t regret it at all because I firmly believe that I would not be the same person today if I had made the choice of staying close to home. Even if I’m now back at home today, I’m just happy that I had a whole decade in early adulthood to get out of my shell. I’m confident I would not be the same person today as I am. What was special in your years with Anne Carpenter at Broad? The place itself is special. The application itself is pretty cool. We get to really work with the science and use our knowledge. What was special for that particular position for me was that I was part of a lab. With me, there was only one software engineer, but the rest of us were postdoc people and research people. It’s kind of like in between. I love research, but my objective was actually more development-based. It was to develop the tools for the researchers. It was the first time I was this close to biology. I’ve learned a lot during my time there, and every time still today that I see a recent news advancement that’s at the boundary of biology and machine learning, I’m happy I was part of that for a second. The Institute is often in the news for good reasons. It is good to know that it is still doing great things. How did you choose them? I saw them, and I guess they chose me. They believed in me. I guess it was mutual. [laughs] It was. Let’s go back further. When did you decide you wanted to be in this area of transforming reality into algorithms? I don’t really know how I landed in this area today. I was always into science in general. I was always into the scientific path. When I started university, I wanted to study physics, then I discovered image processing
17 Computer Vision News Computer Vision News and a whole world of images and how they can change and everything that’s hidden, I guess. That’s how I got into the image and video processing lab at Northwestern, and that’s how I learned about deep learning. At the time, deep learning was this new wave, and everything felt new and exciting. I was part of that, which was amazing to be part of at that time. I just enjoyed working with images - making them look better, making them look different, and especially now, today, with all the new methods with diffusion-based models. It’s new opportunities that were not even conceivable before. I think it’s a great time to live in! What would make you overwhelmingly grateful and happy if it happened in the coming decade? I’m actually working on the opposite – being grateful for what I have today. [she laughs] I will be grateful in the coming decade if I stay happy and healthy and keep learning in my career. That’s honestly all I’m hoping for. What have you learned in all this time that you could teach our readers in a couple of sentences? I learned that everything always works out. When you don’t get into that conference, in that lab, at that job, or whatever it is that you really, really thought you needed/wanted, it always works out, and you always do find an alternative. It sounds so cheesy to always say things happen for a reason, but I really think that it always works out. Do you not miss the deadlines for submitting papers? No, I proudly call myself a very organized “It’s just a nicer feeling to be able to be in an industry setting and just deploy things and really have your thing being used by the client, getting client feedback, and iterating over that. I kind of like that!” Alice Lucas
Computer Vision News Computer Vision News 18 Women in Computer Vision organized person, so I’ve never been an all-nighter type of person. You tell me there’s a deadline in two months, and I’m on it the day you tell me. [laughs] I’ve always tried to remove this time pressure by attacking problems right away. You might give this idea to some of our readers – maybe some of them are not as good as you are at doing this. I think it’s just a temperament thing. Some people, as soon as they send their paper, say, ‘When is the next deadline?’ Yeah, exactly. Of all the things your teachers have taught you, could you pick one thing to tell our readers about? I can think back to my job at the Broad Institute, where it’s not so much teaching but more in the context of industry and culture. Actually, it’s a lab, so it’s the culture of the lab. There, they really made an effort to make sure everybody was welcomed. It just felt like a place where you see the value in each person, and you set an environment where a person can grow and be themselves. They were really good at handling conflict and making sure that it was safe for everybody and that the best work could be produced at the end of the day for science, without any competition or toxicity in the lab. They really set a great example at getting to that culture. So that’s something that I want to take with me. If one day I’m leading a lab or in some kind of leadership position, I really want to make sure that I establish a good culture for the people I’m responsible for. How many computer vision people do you have at Meero? In total, if we expand beyond this real estate product - because we have more products - we’re about 12. Do you also go to conferences and meet some of your friends from years ago when you were a student? I haven’t recently because my friends from years ago are so far away now. There’s a pond that’s separating us. Come to CVPR! [Alice laughs] Yeah, it’s actually something that we’re going to start doing more starting this year. It’s so easy to have your mind be so focused
19 Computer Vision News Computer Vision News on just getting things done, and then if you don’t keep up with the research, well, it just goes so fast that you can be overwhelmed quite quickly. It’s definitely our objective to be there at the next conferences. Is there another great professor you would like to mention? I don’t want to forget to mention my advisor from Northwestern, professor Aggelos Katsaggelos, who actually was amazing. Thinking back about those years, what I think about him is how much trust he put in me. When I think about myself growing and learning, it’s a lot thanks to him. He trusted me with being in a position where he let me take control of things and take risks. He was always open to my ideas. There was a lot of trust, and he was a very good influence for me. Did he make a good choice? Yes, I believe so. [laughs] How did he know to make this good choice? Yeah, it’s a good question. How would he know because it was just a paper application? I’m not quite sure. It was just an earlier version of you that he didn’t know much about. Yeah, maybe. I don’t remember what I wrote in my essay, but I’m quite a good writer, so it could have been a convincing read. Who knows? [she laughs] Read 160 FASCINATING interviews with Women in Computer Vision! Read 160 FASCINATING interviews with Women in Computer Vision! Alice Lucas
Computer Vision News Computer Vision News 20 Grand Challenge - Medical Imaging PANORAMA (Pancreatic Cancer Diagnosis: Radiologists Meet AI) Megan Schuurmans (pictured bottom right with coorganizer Natália Alves), is a third-year PhD candidate in the Diagnostic Image Analysis Group (DIAG) at Radboudumc, Nijmegen. John Hermans (left) is an abdominal radiologist with a background in engineering and medicine. They are here to tell us more about the PANORAMA challenge, the first AI Grand Challenge for pancreatic cancer detection. This challenge is endorsed by the MICCAI Society. Pancreatic cancer remains one of the most formidable types of cancer, with survival rates showing little improvement over the past four decades. In stark contrast to advances seen in the treatment of other types of cancer, like colon and breast cancers, pancreatic cancer continues to endure a grim prognosis and is often diagnosed at an advanced stage, earning it a reputation as a hidden killer.
21 Computer Vision News Computer Vision News Pancreatic Cancer Diagnosis “When you look at the clinical problems now, only 20% of patients can be resected, but 80% cannot be resected,” John tells us. “All those patients get a CT scan at the first time of diagnosis. We looked at the literature and knew by experience that when you look at these scans – and these patients might have had previous scans and when you look at those previous scans, you can already see changes in the tissue indicative that something could be wrong. That’s the group we’re focusing on, and that number could be quite high.” Studies suggest that around 40-60% of patients with pancreatic cancer had previous CT scans that could have indicated early tissue changes. Routine CT scans are particularly important here because, unlike some other cancers, there are no biomarkers in the blood or urine to indicate pancreatic cancer. Without more focused or niche research, CT scans are the only way to identify potential malignancies. “Many people are working on finding a biomarker that will be the holy grail for pancreatic cancer,” John reveals. “There have been some promising ideas, but nothing applicable in the clinic yet.” PANORAMA is the first-ever largescale study comparing radiologists
Computer Vision News Computer Vision News 22 and AI for pancreatic cancer detection. It aims to establish how well radiologists currently perform in detecting pancreatic cancer using contrast-enhanced CTs. This benchmarking is crucial as it sets a standard for AI systems to match or exceed. Furthermore, the challenge seeks to develop AI capable of detecting pancreatic cancer, ideally outperforming or at least matching the expertise of radiologists. It is being organized by PANCAIM, a consortium to develop AI to improve pancreatic cancer diagnosis, prognosis, and treatment. Megan was hired to develop multimodal AI. “We were granted Horizon 2020 funding to see if we can make an impact with AI on the very difficult topic of pancreatic cancer,” she recalls. “This is the start to see where AI could be impactful in the future, hopefully, in a sort of opportunistic screening setting in which we can have AI running in the background of contrast-enhanced CTs to see if we can catch these cancers a bit earlier.” AI’s potential in this context is significant. Radiologists will not need to check every contrasting CT because AI can flag any that warrant further investigation, including those not intended to investigate the pancreas. It could be especially beneficial in peripheral hospitals and non-specialist settings, where expertise in identifying subtle early signs of pancreatic cancer may be limited. “The highest level of achievement we could get is, let’s say, we develop this AI and get it running in every hospital that uses CT,” John suggests. “Everybody can use it in the background, and every CT scan will be screened for pancreatic cancer, and there will be some kind of red flag saying this pancreas is not normal and an expert or non-expert should have a look at it. Then you can pick out the cases earlier, which will be the ideal situation.” John’s main research focus is pancreatic imaging. He develops new techniques to understand better how the pancreas functions and what kinds of tumors there are, using advanced imaging techniques to gain insights into the disease. In the last 10 years, he tells us there has been a fundamental shift toward using AI in medical image analysis. Grand Challenge - Medical Imaging
23 Computer Vision News Computer Vision News The PANORAMA challenge is making steady progress. Megan reports that the training set was released less than a month ago, and teams can now submit their algorithms to a hidden validation set on the Grand Challenge platform. This cohort of 100 cases allows teams to tune their algorithms without prior exposure to the cases. A more extensive hidden test set will be available to developers later in the year, with a subset of this used for a parallel reader study. Do the pair have any advice for competing teams? “My tip would be not just to focus on the presence or absence of tumor morphology but also on surrounding structures,” Megan teases. “Contributing with a pre-processing with segmenting surrounding anatomy of the pancreas or surrounding organs could be very useful.” Changes in surrounding anatomy and indicators like sarcopenia can provide crucial insights into pancreatic cancer. John’s research supports a multifaceted approach, suggesting that integrating additional clinical information with imaging data will enhance AI performance in the future. “It can be clinical information on the patient himself, like their physical condition, age, or previous diseases, but also, what is the pathology outcome? Has the patient received chemotherapy?” he considers. “More and more, you will get additional information added to the imaging.” With pancreatic cancer projected to become the second leading cause of cancer-related deaths in the US by 2030, developing impactful AI solutions to improve early detection and outcomes for patients is becoming more and more urgent. “It’s going to happen quite quickly because of developments in other cancers,” Megan points out. “It’s particularly important to change outcomes in this field that haven’t changed in the last 40 years. If people are eager to participate and centers can contribute data to make this more applicable generically, that would be great.” In addition to cash prizes, winning teams will be invited to join the PANORAMA consortium and listed as authors on an upcoming journal paper summarizing the challenge’s findings. Previous challenges like PICAI and PANDA from the Diagnostic Image Analysis Group have produced high-impact papers, and this will be no exception. Participants have an exciting opportunity to contribute to something unique and influential. Coorganizer Henkjan Huisman Pancreatic Cancer Diagnosis
Computer Vision News Computer Vision News 24 Doctor soon-to-be Marilyn! Capturing the human anatomy is key in many fields, in medicine it helps diagnose a patient and conduct simulation, in graphics, it helps generate a more visually plausible appearance for digital humans and in computer vision it can yield priors on how the human body can move. But capturing someone’s anatomy is a difficult task, it often relies on expensive medical imaging, and the intervention of experts to annotate the captured data. Marilyn’s thesis aims to directly infer people's anatomy from their external body shape, instead of using expensive capturing devices. Especially, she focuses on modeling the bones and the soft tissues. In the first work of her thesis, she addressed the challenge of, given a body shape, generating the corresponding 3D skeleton. This was done by collecting thousands of fullbody medical scans called DXA and lifting these data to 3D to create a paired dataset of 3D body shape along the corresponding 3D skeleton for each subject. With such data, a linear regressor can be trained to learn the correlation between the body shape and the skeleton shape, enabling the generation of a custom skeleton given an unseen subject. Marilyn Keller is a soon-graduating PhD Student in the Perceiving Systems department at the Max Planck Institute for Intelligent Systems. She is supervised by Sergi Pujades and Michael J. Black and her research focuses on inferring people's anatomy solely from their external body shape. Marilyn received an Honorable Mention for Best Paper at SIGGRAPH Asia 2023 and an Honorable Mention at the 2024 Outstanding Female Doctoral Student Prize from @MPI_IS for her PhD work. Marilyn is looking for post PhD opportunities and she’s a great catch: both academia and industry can still get her, so talk to her now!
25 Marilyn Keller Computer Vision News Computer Vision News The limit of medical imaging is that they only show static bodies. So to learn where bones are inside posed bodies, she then used motion capture datasets and biomechanical models to create a paired dataset of people in motion with the skeleton inside. From these data, she built SKEL, a body model that given a pose and a shape parameter outputs a posed body and the corresponding skeleton. Contrary to common body models used in Computer Vision, like SMPL, SKEL has fewer and more anatomical degrees of freedom like the arm and leg flexion and the forearm supination. After working on bones, Marilyn turned to the challenge of predicting soft tissue layers, in particular subcutaneous adipose tissue (the fat layer under the skin), given the external body shape. We started by annotating MRI scans with the different tissues. Then, learning from these MRI required leveraging a statistical body model (SMPL) and an implicit representation of the different tissues, i.e. representing each tissue inside the body by an occupancy function defined on R3. The resulting method, which was called HIT, works as follows. Given a 3D body and a location x ∈ R3, the point x is warped to the average template body, and a Multi-Layer Perceptron is trained to predict the tissue at this location (adipose tissue, lean tissue like muscles and organs, and bones). Marilyn released the implementation code of each project and you can learn more on her website.
Computer Vision News Computer Vision News 26 CARS 2024 Presentation by Negar Kazemipour For decades, clinicians have been using 2D slices of medical images to analyze patients' anatomy and plan surgeries. Progress in graphics allowed clinicians to benefit from 3D visualizations, providing them with a more comprehensive and intuitive representation of patient data. However, interacting with 3D patient data through 2D screens and input systems is challenging. Therefore, to provide a better spatial understanding of the anatomical data and degrees of freedom to input systems, devices like haptics force feedback tools, and MR devices are becoming increasingly studied for surgical planning. The usability analysis of the above technologies in various surgical planning contexts remains largely unexplored. In this work, we compared surgical planning using a Touch-x force feedback haptics device, and a Microsoft HoloLens 2 with conventional planning systems using a 2D mouse monitor, and keyboard. Negar Kazemipour graduated with a master's degree in computer science from Marta Kersten-Oertel’s Applied Perception Lab at Concordia University in Montreal, Canada. Her master's research was about looking at how MR and haptics can be used in surgical planning. Currently, she works as a software engineer at Zimmer Biomet, developing mixed reality (MR) surgical applications. She recently presented the results of her research at the Computer Assisted Radiology and Surgery - CARS 2024 Conference. It was also published in IJCARS. A Usability Analysis of Augmented Reality and Haptics for Surgical Planning
27 Augmented Reality and Haptics … Computer Vision News Computer Vision News We chose three different surgical planning scenarios, including mitral valve delineation, hip tumor resection, and pedicle screw placement to cover planning with soft tissue with complex geometry, hard tissue, and implant placement. Our surgical planning platforms were developed using Unity, and they all provided the same features, like adding and deleting a landmark, navigating through the 3D scene to modify the surgical plan, and saving it. To test the usability of the platforms, we conducted user studies at Concordia and McGill University with non-clinicians (novices) and at Montreal General Hospital with clinicians (experts). We looked at time, and usability based on the NASA-TLX and system-specific questions. Results of the NASA-TLX questionnaire suggest that overall, the haptics system has the highest workload for clinicians, which contradicts the results of our novices’ experience and previous studies. Regarding time, results showed AR was the most time-consuming for both groups compared to the other two interfaces. We predict this may be due to the unfamiliarity of the users with MR devices, as an AR expert surgeon finished planning on HoloLens faster than other users. We also found that the preferences for input and display type varied based on the surgical scenario, suggesting different planning scenarios may benefit from different interaction and visualization methods. For example, for cases involving implant placement, users suggested that AR provides better depth perception and more degrees of freedom for controlling the position and rotation of the implant. In the future, we aim to plan more complex surgeries involving robotics surgery and robot trajectory planning.
Computer Vision News Computer Vision News 28 Congrats, Doctor Szymon! A promising application of deep learning in the medical field is prenatal care. This vital aspect of healthcare is provided to pregnant women to prevent complications and ensure the well-being of both mother and baby before, during, and after birth. Prenatal care includes regular check-ups with a healthcare provider to monitor the mother's health and the fetus's growth and development. In his PhD thesis, Szymon proposed novel deep learning-based methods to improve the health and wellbeing of mothers and fetuses during pregnancy and delivery. Specifically, he focused on developing an automated method for recognizing standard planes and measuring biometrics during routine fetal ultrasound exams. In this work, he compared the performance of deep learning-based methods against experienced clinicians, showing no significant difference between the deep learning-based method and human readers. Additionally, he designed a method for directly estimating fetal birth weight from ultrasound video scans, which has been enhanced by incorporating multimodal data. The training and testing were done on data acquired 24 hours before delivery. This method is crucial for Szymon Płotka completed his PhD in June. He conducted his research at the Quantitative Healthcare Analysis group at the Informatics Institute, Faculty of Science, University of Amsterdam, and the Sano Centre for Computational Medicine. His work was supervised by Clara I. Sánchez, Ivana Išgum, and Arkadiusz Sitek from Harvard Med School. Congrats, Doctor Szymon!
29 Szymon Płotka Computer Vision News Computer Vision News distinguishing the type of delivery: vaginal or Cesarean section. Building on these findings, he introduced a novel method for predicting fetal weight during pregnancy based exclusively on the fetal abdominal view. This method can reduce bias in the estimation of fetal weight by measuring only one fetal body part to achieve the goal. Furthermore, he presented a fast and effective neural network tailored for segmenting and highlighting placental vessels during fetoscopic laser photocoagulation in cases of Twin-to-Twin Transfusion Syndrome (TTTS), aiming to assist surgeons during fetal surgery in clinical environments. The proposed method may aid surgeons during real-time The figure presents sample US frames extracted from fetal US videos. The frames depict standard planes of the fetal head, abdomen, and femur, displayed from left to right. Computer Vision News Publisher: RSIP Vision Copyright: RSIP Vision Editor: Ralph Anzarouth All rights reserved Unauthorized reproduction is strictly forbidden. Our editorial choices are fully independent from IEEE, CVPR and all conference organizers. real-time fetoscopic fetal surgery to accurately identify critical structures and ultimately improve outcomes of TTTS treatments. To conclude, Szymon's contributions have the potential to significantly advance prenatal care by providing more accurate and efficient tools for monitoring and assessing fetal health. His work exemplifies the intersection of deep learning and medical research, offering promising solutions to critical challenges in maternal and fetal medicine.
Computer Vision News Computer Vision News 30 Yes, you read that right, we are talking about past Olympics, that took place exactly 100 years in Paris, like this month. What’s the connection with AI? Well, a lovely video was uploaded on YouTube, made with Alibaba cloud technology, adding glorious colors to the images from the Olympic Games of one century ago. Besides the lovely recolored images of past sport champions, don’t miss Josephine Baker at 0’22”! Traveling back with this video, who’s your favorite Olympic legend? Athlete Harold Abrahams or tennis star Helen Wills? If you ask me, my favorite champion is Finnish athlete Paavo Nurmi at 1’31”. Sometime during the last century, the young runner that I was visited the Helsinki Olympic Stadium and its imposing tower, not without noticing nearby Paavo Nurmi’s whole-body statue. This statue was commissioned by the Finnish government right after the Paris Olympics from this video. He and Finnish compatriot Ville Ritola (1’41”) were the most impressive runners of that competition. I don’t know if it’s the role of our magazine to show you this emotional video: but since it’s very related with AI, I’m happy to do it. You will see a real vintage pole vault performance with almost no safety equipment - my father remembers well: he and fellow pole vaulters literally landed on sand bags! Enjoy, it’s only 2’14”. It is quick and awesome! Olympics 1924 in Paris with AI
31 What’s That? Computer Vision News Computer Vision News Turn the page and you will know!
Computer Vision News Computer Vision News 32 Congrats, Doctor Argho! Visual Question Answering (VQA) frameworks have shown remarkable abilities in extracting meaningful information from images through language queries, applicable across various domains like interpreting radiology images, aiding autonomous vehicle navigation, and improving surveillance systems. Despite their potential, VQA technology has not been fully utilized for damage assessment in rapid search and rescue operations. Argho's PhD work pioneers the application of VQA frameworks for comprehensive damage assessment. VQA enables the extraction of diverse information from UAV/satellite images through natural language queries. This highlevel scene information in real-time facilitates rapid damage estimation, leading to increased efficiency and a reduction in the time required for search and rescue operations. Early response to affected areas is vital Argho Sarkar completed his PhD in Information Systems from University of Maryland Baltimore County under the supervision of Maryam Rahnemoonfar. He’s now a Research Fellow in Computational Pathology/Cytology at Memorial Sloan Kettering Cancer Center. His PhD research mainly focused on developing explainable multimodal and uni-modal machine learning approaches for remote sensing applications, especially for damage assessment.
33 Argho Sarkar Computer Vision News Computer Vision News for saving lives, providing medical assistance, and conducting evacuation efforts. On the other hand, the integration of machine learning models into smart decision support systems raises concerns about model explanation. In remote sensing, limited contextual information can lead to shortcut learning that leads to accurate results with false explanation. Addressing these challenges, Argho's thesis focuses on two key aspects. Firstly, it develops a questionanswering framework for efficient damage assessment using remote sensing imagery. Secondly, it aims to enhance the trustworthiness of model outcomes by developing novel machine learning frameworks tailored for remote sensing in both multi-modal and uni-modal contexts. To achieve the goal, Argho and BINA Lab introduce two large-scale benchmark visual questionanswering datasets for damage assessment, named FloodNet-VQA and RescueNet-VQA. These are the only existing datasets for developing visual question answering frameworks for damage assessment, providing new opportunities for the AI research community. In the later work, supervised attention-based frameworks named SAM-VQA (Supervised Attention Module for Visual Question Answering for Post-Disaster Damage Assessment on Remote Sensing Imagery) are proposed. This framework models the image and question to provide accurate answers with rational visual explanations. It uses manually annotated visual mask that highlights relevant image portions necessary for answering a given question to supervise the attentionobtaining process. SAM-VQA offers improved explanations and achieves higher accuracy compared to stateof-the-art VQA algorithms. In the last work, Argho proposes a novel learning strategy for consistent and robust visual explanations in image classification task for remote sensing. This strategy proposes two distinct loss functions designed to ensure consistency and robustness in visual explanations. The integration of these proposed losses enables the model to obtain improved visual features compared to baseline convolutional architectures, resulting in higher accuracy and enhanced visual explanations. In summary, Argho’s research works make significant contributions in damage assessment and enhance the reliability of model outcomes in remote sensing applications. You can find Argho here. Congrats, Doctor Argho!
RkJQdWJsaXNoZXIy NTc3NzU=