The controller speed internally is so much quicker than the time to I/O that if you need speed, then writing to IO in block reads is by far the most most efficient. On the other hand the absolute simplest means of writing code in opto is direct comms to I/O and if the program is not large and not required to be < 100 ms loop times, then this is acceptable. A former Opto employee claims he always programs the R1 this way and does not see a problem. My take on this is if your program turns out to be bigger and req to be faster that you thought, changing it is a PIA. Therefore I generally stick with the same method of block R/W.
In so far as checking your writes, I always assume that if I had previously turned it on, it is on, and therefore you can rely on the state of the internal I/O variable to check status for internal condition commands. This saves considerable speed. Of course you can also (although makes code more complex) create a separate set of like name variables for just reading the state of the outputs versus the vars used to write them. This allows you to check the actual status, but I considered this completely unnecessary.
I asked Opto early on whether using the extra Ethernet port was faster and was told essentially no. I suspect this is due to the fact that there is only one CPU and therefore the two Ethernet stacks only get one slice each. Also, most do not appreciate how fast 100 MB is. Since the IO is transmitted via UDP, the time required to send a little packet is in the lower u-seconds and the time for each bit is 1.25 nano seconds. Using both ports for the purpose of speed is not necessary unless there is a lot of traffic on the network you are using. Make sure the network is clean first.
The issue of whether or not to create a separate chart for IO is dependent on your strategy. My personal opinion is that if that chart is running fast enough, it may not matter, otherwise it can affect timing. Also, it is one chart running all the time at high slice rates using up more CPU time than including that chart in your main chart. Chart switching takes time, I use one chart if possible with the IO Read at the top of the loop and the write at the bottom. This is the most efficient means and guarantees the sequential operation therefore making it much easier to troubleshoot. Remember, this is a single CPU, therefore everything happens in a single thread in terms of CPU time.
I do advocate the use of block reads and direct writes. This simplifies the code since the number of output writes are much less than the need to read inputs. You can also read the status of the outputs with every read cycle therefore no hit on CPU time and you can check the last status of the output internally. Unless the strategy is large and requires speed, this is a good method.
Using an Int32 for status flags is very efficient and does provide the means of checking status on all 32 IO at one time, however, I find that I do that very seldom, therefore I find using like named variables (reflecting the IO) much easier to use and an Int32 will only hold one 32 point module therefore complicating the issue. If you are not using the Excel method of creating rack table to variable script (and back), then you should check it out. This makes creating a perfect set script commands to load/unload the R/W of the rack, once you have completed the spreadsheet you can paste all 1000 IO vars scripts into a block and get a compile with no spelling errors.
One last thing, if you have done your homework and cannot get enough speed, you can use SoftPac which I suspect runs at greater then 10 times the speed of the S1. At this point the bottle neck becomes the EB1/EB2 and the speed of the I/O modules.