Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Option to divide snaphu unwrapping into tiles #478

Closed
SteffanDavies opened this issue Oct 10, 2022 · 85 comments
Closed

[Feature]: Option to divide snaphu unwrapping into tiles #478

SteffanDavies opened this issue Oct 10, 2022 · 85 comments
Labels
enhancement New feature or request

Comments

@SteffanDavies
Copy link
Contributor

SteffanDavies commented Oct 10, 2022

Completing unwrapping of very large areas can take many days or even weeks when working with large numbers of interferograms, even when unwrapping in parallel. This is a serious bottleneck for GMTSAR. Snaphu has optional tiling parameters which are compatible with multithreading, and unwrapping of multiple smaller tiles will generally be much faster than a single large product.

Some tilling parameters available for Snaphu, according to documentation:

−−nproc n

Use n parallel processes when in tile mode. The program forks a new process for each tile so that tiles can be unwrapped in parallel; at most n processes will run concurrently. Forking is done before data are read. The standard output streams of child processes are directed to log files in the temporary tile directory.

−−tile ntilerow ntilecol rowovrlp colovrlp

Unwrap the interferogram in tile mode. The interferogram is partitioned into ntilerow by ntilecol tiles, each of which is unwrapped independently. Tiles overlap by rowovrlp and colovrlp pixels in the row and column directions. The tiles are then segmented into reliable regions based on the cost functions, and the regions are reassembled. The program creates a subdirectory for temporary files in the directory of the eventual output file (see the −−tiledir and -k options). Tiles can be unwrapped in parallel (see the −−nproc option).

−S

Do single-tile re-optimization after tile-mode initialization. If this option is specified, snaphu will run in tile mode to generate an unwrapped solution, which is then used as the initialization to a single-tile optimization that produces the final unwrapped output. The tile-mode initialization may itself be initialized by the MST or MCF algorithms (or an input unwrapped phase file) as normal. This option is equivalent to running an instance of snaphu in tile mode, then running another instance of snaphu taking the tile-mode output as an unwrapped input via the -u option. Tile parameters must be specified when using this option. This approach is often faster than unwrapping an interferogram as a single tile from an MST initialization, especially if multiple processors are used.

I'm currently not sure how tiling could affect unwrapping quality especially in tile borders. However if this option were available it would be simple to run some tests and it would be optional anyway.

Tiling would also provide an advantage in RAM bottlenecks. For example I am currently unwrapping a large area which requires ~10GB RAM for each interferogram. As I only have 250 GB of RAM, this will limit the number of CPUs I can use to around 20, which is wasting the other 12 available cores.

I would suggest adding additional optional arguments to unwrap_parallel.csh, which would relate to the above Snaphu options. Thus it would be possible to, for example, divide each interferogram into 8 tiles and run 4 parallel snaphu processes for a total of 32 cores or some other combination.

@Xiaohua-Eric-Xu Xiaohua-Eric-Xu added the enhancement New feature or request label Oct 10, 2022
@Xiaohua-Eric-Xu
Copy link
Member

From my understanding of what SNAPHU does, it has to be a global solution (optimization problem). The tile mode seems to be dividing the interferogram and unwrap with some overlap. It may not affect unwrapping when images are with good quality, but may be problematic when lots of residues are on the boarder. I am not sure whether it'll work perfectly and this'll require lots of testing.

@SteffanDavies
Copy link
Contributor Author

From my understanding of what SNAPHU does, it has to be a global solution (optimization problem). The tile mode seems to be dividing the interferogram and unwrap with some overlap. It may not affect unwrapping when images are with good quality, but may be problematic when lots of residues are on the boarder. I am not sure whether it'll work perfectly and this'll require lots of testing.

If this gets implemented in GMTSAR I will definitely run some comparisons soon considering I'm overpaying for unused resources at the moment.

@Xiaohua-Eric-Xu
Copy link
Member

Xiaohua-Eric-Xu commented Oct 10, 2022

Looks sort of OK but with 2x2 tiles. However it's not even reducing the time by a factor of two. (11min vs 6min) Could be problem size related.

image

snaphu_interp.csh.txt

@SteffanDavies
Copy link
Contributor Author

Looks sort of OK but with 2x2 tiles. However it's not even reducing the time by a factor of two. (11min vs 6min) Could be problem size related.

image

snaphu_interp.csh.txt

I would still consider that a significant improvement considering I've been unwrapping for a week now. Could be the difference between 11 and 6 days (!)

@Xiaohua-Eric-Xu
Copy link
Member

The script I sent have some parameters built inside. It requires some customization to run properly for a targeted problem.

@AlexeyPechnikov
Copy link
Contributor

I'm currently not sure how tiling could affect unwrapping quality especially in tile borders.

It works fine and much faster when we have enough tiles overlapping (ROWOVRLP and COLOVRLP). 400 pixel is the recommended overlapping but for faster processing 200 pixel overcalling works for many cases (the command below targeted for Google Colab environment):

conf = sbas.PRM().snaphu_config(defomax=DEFOMAX, NTILEROW=1, NTILECOL=8, ROWOVRLP=200, COLOVRLP=200)

I think the main reasons to use tiling unwrapping are RAM management and the speedup. We can use all the 8 cores together for multiple tiling unwrapping processes on Apple iMac 16GB RAM and have the results ~10-100 times faster depending of the area.

See the live example on Google Colab:
https://colab.research.google.com/drive/12LJqlZNBUmvLlRl98rRFCbKveVPg9Ami?usp=sharing

@SteffanDavies
Copy link
Contributor Author

Official support for this would be a huge improvement for both low end and high end systems.

Low ram? Divide by more tiles.
High CPU count? Divide by more tiles.

@AlexeyPechnikov
Copy link
Contributor

Official support for this would be a huge improvement for both low end and high end systems.

Yes, that's correct. SNAPHU memory management is straightforward and operates the full amount of the required memory always and that's costly everywhere. Split the entire image to N tiles and N CPU cores and you need to manage less then 1/N of the full memory. Try it on no-swap system like to Docker container and you will see the source of the problem: https://hub.docker.com/repository/docker/mobigroup/pygmtsar

@SteffanDavies
Copy link
Contributor Author

@mobigroup Where did you get the information about recommended 400 overlap? For Stamps I usually used 200 / 50 overlap in azimuth / range respectively.

If there is a recommended 400 pixel overlap, then it would be important to know if the user is dividing into too many tiles for a given AOI. Small AOI might not have enough pixels to properly divide into N tiles. This could be done by finding the XY extents and ajusting ntilerow ntilecol to a maximum respectively.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies As I remember SNAPHU shows it as warning when you use smaller overlap. By my experience this value is really robust but costly and we often can use 200 or even 100 pixel overlap with some small visible issues.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies We don't need tiling for small AOI of course. As you maybe remember I shared for you example PyGMTSAR notebooks for ~800 interferograms stack where tiling is really helpful for the grids like to below (cropped using pins) and ~3x larger without pinning:

gmt grdinfo F12_20170111_20170204_phasefilt.grd 
...
F12_20170111_20170204_phasefilt.grd: x_min: 0 x_max: 45120 x_inc: 8 name: x n_columns: 5640
F12_20170111_20170204_phasefilt.grd: y_min: 0 y_max: 5022 y_inc: 2 name: y n_rows: 2511
...

Without tiling you shouldn't expect to process all the data in one day on Apple Air :)

@SteffanDavies
Copy link
Contributor Author

SteffanDavies commented Oct 11, 2022

@mobigroup I see. Maybe GMTSAR could default to 400 if no value is specified. Even at 400 it would still be a massive improvement for very large AOI (currently I'm working with 20.000 x 10.000) if divided into for ex. 4 rows 4 cols.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies I think that's bad assumption for everyone! SNAPHU tiling is valuable for well-selected tiles only. For many cases when decimation used or just a single sub swath processed or pinned subswaths processed SNAPHU tiling can produce weird artifacts and sometimes it works slowly.

@AlexeyPechnikov
Copy link
Contributor

AlexeyPechnikov commented Oct 11, 2022

@SteffanDavies Also, we need to define the number of CPU cores for the tile-based processing and that's tricky for parallel unwrapping. Do you prefer on 8 core CPU host to have 4 unwrapping processed for 2 threads each or vice versa maybe?.. I use 4 and 4 sometimes (16 cores required in total but there is a trick) because it's much faster :)

@SteffanDavies
Copy link
Contributor Author

SteffanDavies commented Oct 11, 2022

@SteffanDavies Also, we need to define the number of CPU cores for the tile-based processing and that's tricky for parallel unwrapping. Do you prefer on 8 core CPU host to have 4 unwrapping processed for 2 threads each or vice versa maybe?..

unwrap_parallel.csh already requires user to define N cores for parallel unwrapping.

Maximum number of available cores can be obtained by:

number_of_cpus=$(lscpu | grep "CPU(s):" | head -n 1 | awk '{print $2}')

unwrap_parallel.csh should assume the user want no tiling in snaphu, of course. Failsafe can be used by multiplying unwrap_parallel.csh by snaphu --nproc.

If unwrap_parallel.csh N cores is 1 and snaphu --nproc threads > 0 && < number_of_cpus then unwrap sequentially using tilling multithread.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies But how are you going to define SNAPHU NPROC parameter the best for all possible cases? The RAM consumption and CPU cores loading should be accounted to do it for the actual case.

@SteffanDavies
Copy link
Contributor Author

@SteffanDavies But how are you going to define SNAPHU NPROC parameter the best for all possible cases? The RAM consumption and CPU cores loading should be accounted to do it for the actual case.

Shouldn't this be up to the user, considering it is entirely optional? It is already possible to run out of RAM using too many CPU in unwrap_parallel.csh, so I usually divide available ram by 12 GB for each thread for large AOI, or less for smaller AOI.

export number_of_cpus=$(lscpu | grep "CPU(s):" | head -n 1 | awk '{print $2}')
export total_memory=$(awk '/MemTotal/ { printf "%i \n", $2/1024/1024 }' /proc/meminfo)
export unwrap_processes=$(python3 -c "print($total_memory // 12)")
if (( $unwrap_processes > $number_of_cpus )); then unwrap_processes=$number_of_cpus; fi

@AlexeyPechnikov
Copy link
Contributor

AlexeyPechnikov commented Oct 11, 2022

@SteffanDavies In my example notebooks I define these parameters separately for the each case and with lots of notes for user. The target is the configuration all all your CPU cores are busy and your RAM amount is sufficient. I'm sure you can't do it for every case automatically because it depends of CPU cores count and RAM amount the CPU speed and the RAM speed and the processing grids size and even the actual area! When our area is masked for some tiles unwrapping is fast and we can use more processing threads than we have CPU cores.

@SteffanDavies
Copy link
Contributor Author

SteffanDavies commented Oct 11, 2022

@SteffanDavies In my example notebooks I define these parameters separately for the each case and with lots of notes for user. The target is the configuration all all your CPU cores are busy and your RAM amount is sufficient. I'm sure you can't do it for every case automatically because it depends of CPU cores count and RAM amount the CPU speed and the RAM speed and the processing grids size and even the actual area! When our area is masked for some tiles unwrapping is fast and we can use more processing threads than we have CPU cores.

This could be a case of premature optimization. Having the option to do it is better than not having a solution to all cases. User discretion is always advised.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies Unwrapping without tiling is the solution for most cases :) When you need to process the 20.000 x 10.000 grid you need to know more than you just processing a single-subswath with decimation. You are optimizing your own case which is outside of scope of use GMTSAR for most of users. For tiling unwrapping on slow HDD or on Google Colab user will really curse you because it works too SLOWWWWWWW always. In PyGMTSAR I just provide for user the easy interface to define any SNAPHU parameter plus some example configurations.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies "As I only have 250 GB of RAM" - that's equal to from 16 to 32 common user computers. Please do not optimise GMTSAR to use "only on 250 GB RAM" because it will be useless software!

@SteffanDavies
Copy link
Contributor Author

@SteffanDavies Unwrapping without tiling is the solution for most cases :)

This is why by default there should be no tiling. That's fine. Only tile when user specifies.

You are optimizing your own case which is outside of scope of use GMTSAR for most of users.

Optional arguments are there for specific cases and users. There is little consequence in adding the feature but huge benefit for some cases.

In PyGMTSAR I just provide for user the easy interface to define any SNAPHU parameter plus some example configurations.

This is the correct approach. Like you said, most users will not use tiles, but for the ones that need it, it is there. Should be the same for standard GMTSAR.

It looks like you are assuming I want to force tiling on most users, why??

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies Please understand me correct - that's much more easy to adjust the SNAPHU calls in GMTSAR shell scripts than select the right tiling configuration for a real case. That's not a problem to use tiling but that's the problem to use tiling correct.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies One more point - how are you going to define other SNAPHU parameters like to tile processing directory? It'd be on RAM drive for your case or on other disk maybe. For tiling unwrapping SNAPHU shows some suggestions and even requirements - and you need to tune some additional parameters before you will be able to start the processing. So you need to show these SNAPHU output lines (how do you define them and how are you going to do it for parallel processing?) and allow to re-define any SNAPHU parameter.

@SteffanDavies
Copy link
Contributor Author

@SteffanDavies Please understand me correct - that's much more easy to adjust the SNAPHU calls in GMTSAR shell scripts than select the right tiling configuration for a real case. That's not a problem to use tiling but that's the problem to use tiling correct.

I would say that expecting users to manually edit GMTSAR scripts is actually the wrong approach to software development and makes it much harder to operate the software. GMTSAR is already difficult to work with compared to other alternatives (such as SARPROZ), and on top of that users are expected to edit source code instead of being given a simple option to wrap Snaphu params? Even giving the user the ability to define these in a config file (like the batch_tops.config for preproc_batch_tops.csh for example) is far superior than editing the original source code / shell scripts!

One more point - how are you going to define other SNAPHU parameters like to tile processing directory? It'd be on RAM drive for your case or on other disk maybe. For tiling unwrapping SNAPHU shows some suggestions and even requirements - and you need to tune some additional parameters before you will be able to start the processing. So you need to show these SNAPHU output lines (how do you define them and how are you going to do it for parallel processing?) and allow to re-define any SNAPHU parameter.

I guess a config file would be acceptable for this, no?

@AlexeyPechnikov
Copy link
Contributor

AlexeyPechnikov commented Oct 11, 2022

I guess a config file would be acceptable for this, no?

So are you going to provide scripts parameters but using them user will need to manually check SNAPHU log files and manually rewrite SNAPHU configuration file! It doesn’t look as helpful option. I suppose if you will try SNAPHU tiling unwrapping on many examples probably you will think twice to possibility of using the feature by just adding some optional parameters but without any tools to check and debug the processing and without easy way to redefine any of SNAPHU parameters. As a result the tiling unwrapping will produce unreasonable crashes and wrong results when SNAPHU doesn’t work with the user-defined and the default parameters combinations and prints suggestions and requirements to the processing logs. At the end of the day, I’ve resolved it defining a custom config where all the parameters could be redefined and running a single interferogram unwrapping to test the processing and the results before the parallel unwrapping… but it doesn’t work for batch GMTSAR processing pipeline. It’d be nice if you have the solution but I’m afraid you’ll just spend your time and the new option will be mostly frustrating for users. Anyway, please think about the possible problems before you started to code the solution.

GMTSAR is hard to use and it’s not actually suitable for high-level hardware configurations or clusters. Just for an example there is no fault-tolerance mechanism for hardware or random software errors in parallel processes which is important for your 32 cores and especially for non-ECC RAM. There is no resource management between the parallel processes and so on and so on. I don’t know how could you resolve the problems in the current software architecture. Actually, I’ve started from a set of bash scripts to replace GMTSAR shell scripts and ended by PyGMTSAR which is based on cluster and fault-tolerance resource manager and I already can work on the same tasks as you work on 32 cores and 256 GB RAM but on Apple Air 24 GB RAM and without so many hassles. Right now I’m building and testing Docker images for GMTSAR as I did for PyGMTSAR to simplify the deployment. There are some compilation and other issues on different hardware platforms which you ignore because it’s important for deployment but it’s not interesting and not related to your current tasks. Sorry, but additional SNAPHU option (which requires so much debugging skills from the users as we discussed above) can’t help anymore! Allow user to install GMTSAR in just one click on any hardware platform and OS and the user will be glad to change a few lines of code to process a huge stack of huge interferograms on a paid project. By my opinion it’d be valuable for GMTSAR project itself if you make some reproducible test cases and document your parameters but one more (or even worse more than one) and not well tested script option(s) will produce even more disappointments.

I’m sorry for so long answer because my english is far away from perfect and sometimes it sounds rude I’m afraid but I only tried to explain my technical vision. I spent enough time resolving as I think the same problem and it was impossible to do in a way that satisfies me for GMTSAR shell scripts. I hope you’d do it better.

@SteffanDavies
Copy link
Contributor Author

SteffanDavies commented Oct 11, 2022

I guess a config file would be acceptable for this, no?

So are you going to provide scripts parameters but using them user will need to manually check SNAPHU log files and manually rewrite SNAPHU configuration file! It doesn’t look as helpful option. I suppose if you will try SNAPHU tiling unwrapping on many examples probably you will think twice to possibility of using the feature by just adding some optional parameters but without any tools to check and debug the processing and without easy way to redefine any of SNAPHU parameters. As a result the tiling unwrapping will produce unreasonable crashes and wrong results when SNAPHU doesn’t work with the user-defined and the default parameters combinations and prints suggestions and requirements to the processing logs. At the end of the day, I’ve resolved it defining a custom config where all the parameters could be redefined and running a single interferogram unwrapping to test the processing and the results before the parallel unwrapping… but it doesn’t work for batch GMTSAR processing pipeline. It’d be nice if you have the solution but I’m afraid you’ll just spend your time and the new option will be mostly frustrating for users. Anyway, please think about the possible problems before you started to code the solution.

GMTSAR is hard to use and it’s not actually suitable for high-level hardware configurations or clusters. Just for an example there is no fault-tolerance mechanism for hardware or random software errors in parallel processes which is important for your 32 cores and especially for non-ECC RAM. There is no resource management between the parallel processes and so on and so on. I don’t know how could you resolve the problems in the current software architecture. Actually, I’ve started from a set of bash scripts to replace GMTSAR shell scripts and ended by PyGMTSAR which is based on cluster and fault-tolerance resource manager and I already can work on the same tasks as you work on 32 cores and 256 GB RAM but on Apple Air 24 GB RAM and without so many hassles. Right now I’m building and testing Docker images for GMTSAR as I did for PyGMTSAR to simplify the deployment. There are some compilation and other issues on different hardware platforms which you ignore because it’s important for deployment but it’s not interesting and not related to your current tasks. Sorry, but additional SNAPHU option (which requires so much debugging skills from the users as we discussed above) can’t help anymore! Allow user to install GMTSAR in just one click on any hardware platform and OS and the user will be glad to change a few lines of code to process a huge stack of huge interferograms on a paid project. By my opinion it’d be valuable for GMTSAR project itself if you make some reproduce test cases and document your parameters but one more (or even worse more than one) and not well tested script option(s) will produce even more disappointments.

I’m sorry for so long answer because my english is far away from perfect and sometimes sounds rude I’m afraid but I only tried to explain my technical vision. I spent enough time resolving as I think the same problem and it was impossible todo by satisfiable way for GMTSAR shell scripts. I hope you’d do it better.

I appreciate your insight and obviously you have experience with this issue.

As you say GMTSAR is not mature yet for parallel processing, I have even had to make multiple changes myself to make some tasks run many times faster especially considering I run multiple SSD and HEDT / EPYC CPU and I/O or threads is not a bottleneck for me, but RAM is probably the primary factor in unwrapping and sbas_parallel without the slower -mmap option. Of course it is possible to go up to 1 TB of RAM on cloud servers but it starts to get a bit excessive in terms of cost and hardware efficiency. I've had sbas_parallel consume almost 300 GB of RAM on 32 cores if I remember correctly.

I have seen many users on this forum ask why unwrapping is very slow for large AOI, taking many days. And then atmospheric correction sometimes many more.

However it looks like this project has made some advances towards parallelization and that's great.

@SteffanDavies
Copy link
Contributor Author

SteffanDavies commented Oct 11, 2022

I will provide one such example:

image

This is a 16 thread system running standard GMTSAR unwrap_parallel.csh. 16 threads is not much considering this is standard for current desktop workstations (such as Ryzen 1700x which is already 5 years old...). I am already consuming 150 GB RAM, which is beyond limitation of even high end motherboards (128 GB). So even with access to standard hardware RAM with severely limit CPU potential.

Also notice how these have been unwrapping for 183 hours.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies I follow your changes and this one is very close to my recent attempts to automate the tiling unwrapping :) Some days ago I've removed the tiling from Google Colab notebook because it works 10-100 times slowly while 24 GB RAM is more than enough for this case but the disk storage is slow and so disk-stored tile processing files are the source of the problem. There are lots of sometimes unpredictable factors for the tiled processing success. The atmospheric correction is black box because we have no way to select the right parameters apriory... I have seen many users on this forum who ask about the right ones :) I think it means we need more documentation and examples and less code. With well-known gaussian filtering (detrending) and well-known correlation-weighted least squares solution I have better results (the example notebook to compare the common weighted least squares solution and GMTSAR SBAS is shared on Google Colab) than using the hard to understand atmospheric correction (ok, that's not so hard if we check the papers but in this case it's surprising that we spend so much resources to converge the measurement to to a linear trend... and miss all seasonal and other valuable changes). Is there any reproducible example when the atmospheric correction works well?.. I think no. And how could we argument user to use so slow and time/RAM/CPU consuming GMTSAR SBAS with magic parameters when the result doesn't look valuable? Obviously, that's not a problem which could be resolved by new code additions. At the same time I found lots of insights with the forum discussions but these insights are hard or impossible to realise using upstream GMTSAR.

@Xiaohua-Eric-Xu
Copy link
Member

It's actually patch-to-patch processing, which usually refers to a pair-wise processing. And batch processing usually means working on a stack of data. You may have noticed that I added an option to p2p_processing.csh a couple of months ago called skip_master, which could either process the master image only or assume master image is done and proceed with the aligned image. Hopefully this'll lead to a universal batch code. It'll still need a couple of steps but those scripts could be easily connected later on.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies Ok, 63 merged interferograms are ready in ~8 hours using 4 CPU cores only (due to RAM limitations):

gmt grdinfo F12_20220109_20220121_corr.grd 
...
F12_20220109_20220121_corr.grd: x_min: 0 x_max: 45124 x_inc: 4 name: x n_columns: 11281
F12_20220109_20220121_corr.grd: y_min: 0 y_max: 25617 y_inc: 1 name: y n_rows: 25617
...

It'd be ~8 times faster on your 32 CPU machine. That means you need ~3 hours to process 2 subswaths and 2 scenes together on your single host.

Use stack correlation to unwrap well coherence pixels only (black areas excluded):

corr_stack = corrs.mean('pair').compute()
corr_mask = xr.where(corr_stack >= CORRSTACKLIMIT,1,0)

image

SNAPHU custom configuration defined below:

conf = sbas.PRM().snaphu_config(
NTILEROW=16, NTILECOL=32, ROWOVRLP=200, COLOVRLP=100,NPROC=8,
TILECOSTTHRESH=250, MINREGIONSIZE=200, SINGLETILEREOPTIMIZE=False
)

SINGLETILEREOPTIMIZE is disabled because I don't have enough RAM for it and it's mostly useless on my tests.

image

A single interferogram unwrapping requires 14 minutes on 8 cores 16 GB RAM Apple Silicon iMac. On your 32 cores configuration (and almost unlimited RAM) it'd be ~3 minutes for a grid 2 times larger than yours. So you could unwrap ~200 interferograms for two scenes merged in one day (10 hours) on the single machine. And you don't need to run the separate processing on two hosts and merge the separately unwrapped results.

But your prefer to spend a whole week for the computation on the two hosts and merge the results manually instead of a single day on one of them and you even reproach me in a premature optimisation. I'm really wondered what do you mean when I can get the results on Apple Air faster then you on your mini datacenter having 32+16 CPU cores and almost 1 TB RAM :)

SBAS processing with atmospheric smoothing is not challenging too. I'll start all the interferograms (63) unwrapping for the night to check SBAS processing tomorrow.

P.S. By the way, I need to optimise my function for reverse geocode grid calculation because it requires 42 GB RAM for 2 subswaths and 2 scenes with 15m resolution :)

@AlexeyPechnikov
Copy link
Contributor

@Xiaohua-Eric-Xu I think there are many advantages for the current stack-based processing. Why do you want drop it? That' possible to optimise the stacked processing excluding SLC files creation and using on-the-fly processing on the source GeoTIFF and NetCDF scenes (the both data formats allow cloud-optimised tiling reading). The main GMTSAR problem is obviously GMT-based realisation but your algorithms are great.

@SteffanDavies
Copy link
Contributor Author

@SteffanDavies Ok, 63 merged interferograms are ready in ~8 hours using 4 CPU cores only (due to RAM limitations):

gmt grdinfo F12_20220109_20220121_corr.grd 
...
F12_20220109_20220121_corr.grd: x_min: 0 x_max: 45124 x_inc: 4 name: x n_columns: 11281
F12_20220109_20220121_corr.grd: y_min: 0 y_max: 25617 y_inc: 1 name: y n_rows: 25617
...

It'd be ~8 times faster on your 32 CPU machine. That means you need ~3 hours to process 2 subswaths and 2 scenes together on your single host.

Use stack correlation to unwrap well coherence pixels only (black areas excluded):

corr_stack = corrs.mean('pair').compute()
corr_mask = xr.where(corr_stack >= CORRSTACKLIMIT,1,0)

image

SNAPHU custom configuration defined below:

conf = sbas.PRM().snaphu_config(
NTILEROW=16, NTILECOL=32, ROWOVRLP=200, COLOVRLP=100,NPROC=8,
TILECOSTTHRESH=250, MINREGIONSIZE=200, SINGLETILEREOPTIMIZE=False
)

SINGLETILEREOPTIMIZE is disabled because I don't have enough RAM for it and it's mostly useless on my tests.

image

A single interferogram unwrapping requires 14 minutes on 8 cores 16 GB RAM Apple Silicon iMac. On your 32 cores configuration (and almost unlimited RAM) it'd be ~3 minutes for a grid 2 times larger than yours. So you could unwrap ~200 interferograms for two scenes merged in one day (10 hours) on the single machine. And you don't need to run the separate processing on two hosts and merge the separately unwrapped results.

But your prefer to spend a whole week for the computation on the two hosts and merge the results manually instead of a single day on one of them and you even reproach me in a premature optimisation. I'm really wondered what do you mean when I can get the results on Apple Air faster then you on your mini datacenter having 32+16 CPU cores and almost 1 TB RAM :)

SBAS processing with atmospheric smoothing is not challenging too. I'll start all the interferograms (63) unwrapping for the night to check SBAS processing tomorrow.

P.S. By the way, I need to optimise my function for reverse geocode grid calculation because it requires 42 GB RAM for 2 subswaths and 2 scenes with 15m resolution :)

Are these 14 minutes unwrapping times based on the Snaphu tiling? How long would it take to unwrap 1 interferogram without tiles?
The reason it takes me a week to unwrap is precisely because I don't use tiling at all. It's not about preference, nobody likes waiting (unless you mean preference of using GMTSAR). Of course if I tiled with Snaphu it would reduce my time by several days, but still not as fast as your solution obviously.

Good work.

@AlexeyPechnikov
Copy link
Contributor

Yes, sure, this is illustration of SNAPHU tiling approach- see the SNAPHU config parameters in my post. We’d make it faster because even tiles with small overlapping could be merged together, we just need to rewrite or optimize SNAPHU SINGLETILEREOPTIMIZE post-processing. I see it as full well decimated interferogram unwrapping plus tiled precision interferogram unwrapping and splitting to spatial components and checking them to define and fix tile-related issues. Also, you’d use SINGLETILEREOPTIMIZE option as is because you have enough RAM for it.

Hmm, I didn’t try to run SNAPHU without tiling on this grid because the processing will use just one CPU core and lots of swap.

Anyway, you still have a lot of disk operations and need to tune the tiling parameters. Let’s pay attention I can do it interactively using PyGMTSAR but you can’t do the same using GMTSAR batch processing. For the tiling unwrapped output I can do (in just one line of code) detrending with gaussian filtering and map the result to check if tiling works well. I have no idea how could you put all the mandatory possibilities in the new arguments for GMTSAR scripts… I’m afraid without these debugging and visualization tools you will be stacked to launch the full processing again and again to produce some valuable results. I tried to add similar options to my bash scripts based SBAS GMTSAR pipeline and it was not a good way.

@chintals
Copy link
Contributor

chintals commented Oct 12, 2022

Looks sort of OK but with 2x2 tiles. However it's not even reducing the time by a factor of two. (11min vs 6min) Could be problem size related.

image

snaphu_interp.csh.txt

@Xiaohua-Eric-Xu I see that you enabled SNAPHU tile unwrapping with this command.

snaphu phase.in gmt grdinfo -C phase_patch.grd | cut -f 10 -f $sharedir/snaphu/config/snaphu.conf.brief -c corr.in -o unwrap.out -v -s -g conncomp.out $TILE

I am curious to see the SNAPHU log, because single tile re-optimization is enabled with a -S flag

snaphu phase.in gmt grdinfo -C phase_patch.grd | cut -f 10 -f $sharedir/snaphu/config/snaphu.conf.brief -c corr.in -o unwrap.out -v -s -g conncomp.out $TILE -S

I had mentioned this previously in other thread that when you enable -S flag SNAPHU optimization runs on each tile individually and uses much less memory and time compared to serial SNAPHU optimization on a complete grid (or single tile). After individually unwrapping the tiles, it uses the overlap (in pixels) parameter to estimate offset from adjacent tiles, and then reconstructs a merged unwrap tile, which is then used as initialization for single tile unwrapping. so essentially unwrapping twice.

For a two frame SLC with 2 subswath, it reduced unwrapping from 36+ hours on serial to ~2 hours in parallel (dense vegetation+poor coherence). and the parallel portion of SNAPHU only run for the first step, where each individual tiles are unwrapped. After merge the individual tiles, the second unwrapping is essentially single core. It is widely mentioned on forums that ideal pixel number is around 2000 to 3000 pixels per core, which could be used as guidance to estimate number of parallel instance.

The tile control parameter can be set in snaphu.conf file (on the fly, by editing the conf file, uncomment and update tile numbers - can be done with another csh file) and invoke -S flag iin the csh script when n_proc is set to any other number than 1 (which could be in the parameter file such as batchtops config or so). And snaphu interp csh would not require much changes. You can have cpu count set to 4, but number of tile much higher and SNAPHU will que the tile unwrapping.

@chintals
Copy link
Contributor

@Xiaohua-Eric-Xu I think there are many advantages for the current stack-based processing. Why do you want drop it? That' possible to optimise the stacked processing excluding SLC files creation and using on-the-fly processing on the source GeoTIFF and NetCDF scenes (the both data formats allow cloud-optimised tiling reading). The main GMTSAR problem is obviously GMT-based realisation but your algorithms are great.

I agree GMTSAR algorithms are great! but serial GMT operations with heavy IO, and serial C instances are bottlenecks. GMTSAR algorithms are ideal for learning and debugging. scaling and speedup would require customization on the user end.

@chintals
Copy link
Contributor

Took me forever to finish reading through. These are interesting and helpful discussions. I agree that the software development has quite a lot limitations mostly due to the current architecture of the code and the old shell scripting. Slowly we are thinking of moving toward a universal p2p_ set of code and a universal batch code, which will be controlled hopefully by just one config file. The biggest challenge is of course, things changes from satellite to satellite. Most of you have dealt with Sentinel-1 data and haven't yet touched the older generation of SAR data and there are already tons of problems. We have in our scope to get a set of Python modules implemented such to replace the shell scripts. Hopefully some of the issues could get address through the process. It takes time to happen...

And don't forget upcoming satellites, NISAR :)

@AlexeyPechnikov
Copy link
Contributor

@chintals

GMTSAR algorithms are ideal for learning and debugging. scaling and speedup would require customization on the user end.

As I show in PyGMTSAR the base GMTSAR tools are much faster than you could even think about them! And how about the new way to distribute GMTSAR for end users on Linux/MacOS/Linux arm64/amd64,... without so much hassle with installation: #484

@Xiaohua-Eric-Xu
Copy link
Member

@Xiaohua-Eric-Xu I think there are many advantages for the current stack-based processing. Why do you want drop it? That' possible to optimise the stacked processing excluding SLC files creation and using on-the-fly processing on the source GeoTIFF and NetCDF scenes (the both data formats allow cloud-optimised tiling reading). The main GMTSAR problem is obviously GMT-based realisation but your algorithms are great.

No we don't want to drop it. The shell scripts will always be there and the newer development will focus on breaking the limitations and make things easier to use for non-linux-gurus.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies 63 interferograms tiled unwrapping completed in 14 hours:

image

@SteffanDavies
Copy link
Contributor Author

@SteffanDavies 63 interferograms tiled unwrapping completed in 14 hours:

image

I am currently testing PyGMTSAR with tilling on a test stack but am running into issues, can you take a look at what I posted in your repository?

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies yes, sure, thanks for all the issue reports!

@AlexeyPechnikov
Copy link
Contributor

@Xiaohua-Eric-Xu @SteffanDavies I have some results about unwrapping tricks. As SNAPHU doesn't support NAN values in the input grids we need to fill data gaps using zeroes. There are two ways here: to mask coherence only (i.e. phase is not valid when coherence is zero) and to mask by zeroes the both coherence and phase grids (i.e. masked area have no measurable changes). That's interesting that the second way is significantly faster and it's more stable (in terms of the results difference and the processing time) at the same time (intuitively the 1st one looks more stable I think).

Zero-mask coherence only

The results are not stable for different tiles configuration (selected two different results):

conf = sbas.PRM().snaphu_config(NTILEROW=16, NTILECOL=16, ROWOVRLP=200, COLOVRLP=200,SINGLETILEREOPTIMIZE=False)
CPU times: user 1min 29s, sys: 28.3 s, total: 1min 57s
Wall time: 41min 9s

image

conf = sbas.PRM().snaphu_config(NTILEROW=16, NTILECOL=8, ROWOVRLP=200, COLOVRLP=200, SINGLETILEREOPTIMIZE=False)
CPU times: user 2min 2s, sys: 37.5 s, total: 2min 39s
Wall time: 56min 34s

image

Zero-mask coherence and phase

The results are stable for different tiles configuration (the same as for the 1st case above):

conf = sbas.PRM().snaphu_config(NTILEROW=16, NTILECOL=16, ROWOVRLP=200, COLOVRLP=200,SINGLETILEREOPTIMIZE=False)
CPU times: user 37.3 s, sys: 15.3 s, total: 52.6 s
Wall time: 14min 33s

image

conf = sbas.PRM().snaphu_config(NTILEROW=16, NTILECOL=8, ROWOVRLP=200, COLOVRLP=200, SINGLETILEREOPTIMIZE=False)
CPU times: user 45 s, sys: 16.7 s, total: 1min 1s
Wall time: 15min 49s

image

The same interferograms unwrapping as above

Using the 2nd approach all the 63 interferograms (2 subswaths x 2 scenes on grid size 11157 x 25616) unwrapped in 10 hours on iMac 16 GB RAM.

Unknown

GMTSAR way

GMTSAR mixes the two approaches for the set of applied masks. I'm surprised about the reasons to use the different ways for user-defined and landmask (there are 3 mask in total as I remember).

@Xiaohua-Eric-Xu
Copy link
Member

Xiaohua-Eric-Xu commented Nov 1, 2022

There are two ways here: to mask coherence only (i.e. phase is not valid when coherence is zero) and to mask by zeroes the both coherence and phase grids (i.e. masked area have no measurable changes). That's interesting that the second way is significantly faster and it's more stable (in terms of the results difference and the processing time) at the same time (intuitively the 1st one looks more stable I think).

This is expected. For most unwrapping algorithms that plays with residues, the number of residues basically determines how long the algorithm will run. The first approach does not reduce residues, only changes the weigh (coherence) which is to tell the algorithm that these places are not as important but still need a solution. The second approach removes pixels that are bad and will reduce the number of residues.

For interseismic or slow changing scenario, the second is always recommended (similar to snaphu_interp.csh). However, if you would like to solve for unreliable locations, e.g., near an earthquake rupture, the first is a better masking approach, though it takes longer to run (snaphu.csh). For cases with good coherence, the two are about the same.

@AlexeyPechnikov
Copy link
Contributor

@Xiaohua-Eric-Xu That’s still not obvious for me why the fastest solution is the most accurate too. When we resolve the low-coherence areas the result is (potentially) more accurate because we save the changes for these areas but actually the solution is not stable even for large well-coherent areas. And the numerical instability is not related to rectangular tiles here. While I don’t know which solution is more accurate I always prefer more stable and predictable numerical solution of course. Anyway, SNAPHU tiled unwrapping looks good I think.

Maybe do you know some alternative unwrapping engine to compare the results? SNAPHU code looks too weird for me. Mainly it works very well but that’s extremely hard to debug some issues. NODATA values support would be the key to just ignore some areas while it might produce a set of not connected and separately unwrapped areas.

@Xiaohua-Eric-Xu
Copy link
Member

That’s still not obvious for me why the fastest solution is the most accurate too

They may not be. Your "accurate" definition seems to be "spatially smooth" while a lot of natural processes may give you "non-smooth" signals, e.g., earthquakes. And since phase unwrapping is basically solving for the path to integrate phase, less residues usually means the path is easier to solve (or optimize) and more predictable.

SNAPHU code looks too weird for me.

I heard there being quite a number of Matlab code for phase unwrapping but I am not sure which one to use.

@AlexeyPechnikov
Copy link
Contributor

@Xiaohua-Eric-Xu That's not related to visual properties (like to smoothness) because I mean the numerical stability only - fundamentally, we can't believe to algorithm which produces too different results for a bit different processing parameters. For the practical usage that's more complicated because we can have almost the same detrended results for different unwrapped phase obtained for the same interferogram using different parameters:

image

image

You always advice to use detrending and the results above look as the prove why the detrending is so valuable (to resolve unwrapping issues).

@SteffanDavies Maybe do you compare your unwrapped detrended results for different unwrapping parameters and resolutions? It'd be insightful to know ho it works for your case.

@Xiaohua-Eric-Xu
Copy link
Member

we can't believe to algorithm which produces too different results for a bit different processing parameters.

Yeah, maybe phase unwrapping is more like art :). Especially for an area filled with residues.

I guess the major problem is that we don't really know the ground truth, i.e. the real propagation delay that includes deformation, atmospheric noise, orbit error, ... So when you show different solutions from phase unwrapping, there could always be an argument.

@AlexeyPechnikov
Copy link
Contributor

How do you think is it valuable to test PyGMTSAR for more than 2 full scenes (6 subswaths in total, grid size 25617 x 17029)? Potentially I think we can process 5 full scenes using GMTSAR phasediff & phasefilt binaries on Apple Air 16 GB RAM but I'm not sure if the case is practically useful.
For now, the new developing PyGMTSAR version works well for 2 full scenes and 63 interferograms on 16 GB RAM host using SNAPHU tiling of course:

image

image

@Xiaohua-Eric-Xu
Copy link
Member

I'm not sure if the case is practically useful.

It'll be useful if someone would like to investigate the deformation over a large area such as the Tibet plateau.

@AlexeyPechnikov
Copy link
Contributor

Thanks, understand. As I suppose 30 meters resolution is more than enough for the plateau deformation analysis, right? We can merge 8 full scenes with resolution 30m for the same grid size as 2 scenes with resolution 15 meters. Actually, I didn’t think before that someone could use 15 meters resolution for a large area analysis but the roads analysis is the case.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies @Xiaohua-Eric-Xu Recent PyGMTSAR update allows to process the examples using 4 GB RAM in Docker container on Windows/MacOS/Linux/etc. (where the RAM and swap are hard limited), see the image on DockerHub https://hub.docker.com/r/mobigroup/pygmtsar I could build Docker image for 2 stitched scenes processing on 8 or 16 GB RAM if it'd be helpful (8 GB RAM limit allows to run the processing on 10 years old laptops). Please let me know what do you think about it.
full small

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies @Xiaohua-Eric-Xu

full_small

PyGMTSAR (Python GMTSAR) - Sentinel-1 Satellite Interferometry (SBAS InSAR) for Experts - see Live Jupyter Notebooks to know how to process 34 large interferograms using high resolution 15 meters (2 stitched SLC scenes 3 subswath each and the grid size ~25000x25000 pixels) using 16 GB RAM only. I test this example on Apple Air laptop and even in reproducible Docker container (to be sure swap is not used) and it works well. Use 50 GB Docker image available on DockerHub or download the example dataset (50 GB zipped size) and the notebooks as described by the link: https://hub.docker.com/repository/docker/mobigroup/pygmtsar-large

@AlexeyPechnikov
Copy link
Contributor

AlexeyPechnikov commented Dec 2, 2022

@SteffanDavies 34 interferograms processing requires 2 days on Apple Air in 16 GB RAM Docker container and it is about 2 times faster without the container. So your ~200 interferograms could be processed in ~1 week on Apple Air 16 GB RAM instead of the separate scenes processing on your two hosts (32 cores 512 GB RAM + 16 cores 256 GB RAM) what requires the same time.

@SteffanDavies
Copy link
Contributor Author

Looks sort of OK but with 2x2 tiles. However it's not even reducing the time by a factor of two. (11min vs 6min) Could be problem size related.
image
snaphu_interp.csh.txt

@Xiaohua-Eric-Xu I see that you enabled SNAPHU tile unwrapping with this command.

snaphu phase.in gmt grdinfo -C phase_patch.grd | cut -f 10 -f $sharedir/snaphu/config/snaphu.conf.brief -c corr.in -o unwrap.out -v -s -g conncomp.out $TILE

I am curious to see the SNAPHU log, because single tile re-optimization is enabled with a -S flag

snaphu phase.in gmt grdinfo -C phase_patch.grd | cut -f 10 -f $sharedir/snaphu/config/snaphu.conf.brief -c corr.in -o unwrap.out -v -s -g conncomp.out $TILE -S

I had mentioned this previously in other thread that when you enable -S flag SNAPHU optimization runs on each tile individually and uses much less memory and time compared to serial SNAPHU optimization on a complete grid (or single tile). After individually unwrapping the tiles, it uses the overlap (in pixels) parameter to estimate offset from adjacent tiles, and then reconstructs a merged unwrap tile, which is then used as initialization for single tile unwrapping. so essentially unwrapping twice.

For a two frame SLC with 2 subswath, it reduced unwrapping from 36+ hours on serial to ~2 hours in parallel (dense vegetation+poor coherence). and the parallel portion of SNAPHU only run for the first step, where each individual tiles are unwrapped. After merge the individual tiles, the second unwrapping is essentially single core. It is widely mentioned on forums that ideal pixel number is around 2000 to 3000 pixels per core, which could be used as guidance to estimate number of parallel instance.

The tile control parameter can be set in snaphu.conf file (on the fly, by editing the conf file, uncomment and update tile numbers - can be done with another csh file) and invoke -S flag iin the csh script when n_proc is set to any other number than 1 (which could be in the parameter file such as batchtops config or so). And snaphu interp csh would not require much changes. You can have cpu count set to 4, but number of tile much higher and SNAPHU will que the tile unwrapping.

@chintals You mention enabling -S flag when --nproc > 1, however I believe it can be more efficient in some cases to run parallel tiled unwrapping of multiple interferograms using only 1 CPU for each unwrapping process.

This is because, at least on my tests, snaphu does not use full 100% cpu utilization 100% of the time when unwrapping a tiled interferogram using multiple cores, however when using single core it is always 100% and also does not need to wait for final tiles to finish before moving onto the next interferogram.

See here for example:

image

I am unwrapping 16 interferograms in parallel using 1 core each and a custom tile configuration that gives me between 2000-3000 pixels per tile. Memory usage can spike up which I assume is due to the -S single tile unwrapping.

The idea is to maximize CPU usage at all times while maintaining RAM manageable.

@AlexeyPechnikov
Copy link
Contributor

@SteffanDavies You’d use 1.5N SNAPHU processes on N cores CPU for the effective CPU utilization. Also, the main complexity is to select the right amount of SNAPHU processes and tiles number for your available RAM amount. See the Docker image for the illustration: https://hub.docker.com/r/mobigroup/pygmtsar-large There is no way to do it in GMTSAR batch processing because we need to perform multiple test runs and check SNAPHU logs and output results. Are you ready to start the full processing including interferograms creation for the every test? Or maybe are you going to do the tests manually in command line substituting all the parameters by your hands? Sure, that’s not a reproducible way which can be used regularly. While you have no reproducible way which is completely coded in a single script or Jupyter notebook it does not work for users. Probably, you will not be able to repeat your own steps a few months later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants